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Preface 


The title of the book sounds a bit mysterious. Why should anyone read this 
book if it presents the subject in a wrong way? What is particularly done 
“wrong” in the book? 


Before answering these questions, let me first describe the target au- 
dience of this text. This book appeared as lecture notes for the course 
“Honors Linear Algebra”. It supposed to be a first linear algebra course for 
mathematically advanced students. It is intended for a student who, while 
not yet very familiar with abstract reasoning, is willing to study more rigor- 
ous mathematics than what is presented in a “cookbook style” calculus type 
course. Besides being a first course in linear algebra it is also supposed to be 
a first course introducing a student to rigorous proof, formal definitions—in 
short, to the style of modern theoretical (abstract) mathematics. The target 
audience explains the very specific blend of elementary ideas and concrete 
examples, which are usually presented in introductory linear algebra texts 
with more abstract definitions and constructions typical for advanced books. 


Another specific of the book is that it is not written by or for an alge- 
braist. So, I tried to emphasize the topics that are important for analysis, 
geometry, probability, etc., and did not include some traditional topics. For 
example, I am only considering vector spaces over the fields of real or com- 
plex numbers. Linear spaces over other fields are not considered at all, since 
I feel time required to introduce and explain abstract fields would be better 
spent on some more classical topics, which will be required in other dis- 
ciplines. And later, when the students study general fields in an abstract 
algebra course they will understand that many of the constructions studied 
in this book will also work for general fields. 


ll 


iv Preface 


Also, I treat only finite-dimensional spaces in this book and a basis 
always means a finite basis. The reason is that it is impossible to say some- 
thing non-trivial about infinite-dimensional spaces without introducing con- 
vergence, norms, completeness etc., i.e. the basics of functional analysis. 
And this is definitely a subject for a separate course (text). So, I do not 
consider infinite Hamel bases here: they are not needed in most applica- 
tions to analysis and geometry, and I feel they belong in an abstract algebra 
course. 


Notes for the instructor. There are several details that distinguish this 
text from standard advanced linear algebra textbooks. First concerns the 
definitions of bases, linearly independent, and generating sets. In the book 
I first define a basis as a system with the property that any vector admits 
a unique representation as a linear combination. And then linear indepen- 
dence and generating system properties appear naturally as halves of the 
basis property, one being uniqueness and the other being existence of the 
representation. 


The reason for this approach is that I feel the concept of a basis is a much 
more important notion than linear independence: in most applications we 
really do not care about linear independence, we need a system to be a basis. 
For example, when solving a homogeneous system, we are not just looking 
for linearly independent solutions, but for the correct number of linearly 
independent solutions, i.e. for a basis in the solution space. 


And it is easy to explain to students, why bases are important: they 
allow us to introduce coordinates, and work with R” (or C”) instead of 
working with an abstract vector space. Furthermore, we need coordinates 
to perform computations using computers, and computers are well adapted 
to working with matrices. Also, I really do not know a simple motivation 
for the notion of linear independence. 


Another detail is that I introduce linear transformations before teach- 
ing how to solve linear systems. A disadvantage is that we did not prove 
until Chapter 2 that only a square matrix can be invertible as well as some 
other important facts. However, having already defined linear transforma- 
tion allows more systematic presentation of row reduction. Also, I spend a 
lot of time (two sections) motivating matrix multiplication. I hope that I 
explained well why such a strange looking rule of multiplication is, in fact, 
a very natural one, and we really do not have any choice here. 


Many important facts about bases, linear transformations, etc., like the 
fact that any two bases in a vector space have the same number of vectors, 
are proved in Chapter 2 by counting pivots in the row reduction. While most 
of these facts have “coordinate free” proofs, formally not involving Gaussian 
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elimination, a careful analysis of the proofs reveals that the Gaussian elim- 
ination and counting of the pivots do not disappear, they are just hidden 
in most of the proofs. So, instead of presenting very elegant (but not easy 
for a beginner to understand) “coordinate-free” proofs, which are typically 
presented in advanced linear algebra books, we use “row reduction” proofs, 
more common for the “calculus type” texts. The advantage here is that it is 
easy to see the common idea behind all the proofs, and such proofs are easier 
to understand and to remember for a reader who is not very mathematically 
sophisticated. 


I also present in Section 8 of Chapter 2 a simple and easy to remember 
formalism for the change of basis formula. 


Chapter 3 deals with determinants. I spent a lot of time presenting a 
motivation for the determinant, and only much later give formal definitions. 
Determinants are introduced as a way to compute volumes. It is shown that 
if we allow signed volumes, to make the determinant linear in each column 
(and at that point students should be well aware that the linearity helps a 
lot, and that allowing negative volumes is a very small price to pay for it), 
and assume some very natural properties, then we do not have any choice 
and arrive to the classical definition of the determinant. I would like to 
emphasize that initially I do not postulate antisymmetry of the determinant; 
I deduce it from other very natural properties of volume. 


Note, that while formally in Chapters 1—3 I was dealing mainly with real 
spaces, everything there holds for complex spaces, and moreover, even for 
the spaces over arbitrary fields. 


Chapter 4 is an introduction to spectral theory, and that is where the 
complex space C” naturally appears. It was formally defined in the begin- 
ning of the book, and the definition of a complex vector space was also given 
there, but before Chapter 4 the main object was the real space R”. Now 
the appearance of complex eigenvalues shows that for spectral theory the 
most natural space is the complex space C”, even if we are initially dealing 
with real matrices (operators in real spaces). The main accent here is on the 
diagonalization, and the notion of a basis of eigesnspaces is also introduced. 


Chapter 5 dealing with inner product spaces comes after spectral theory, 
because I wanted to do both the complex and the real cases simultaneously, 
and spectral theory provides a strong motivation for complex spaces. Other 
than the motivation, Chapters 4 and 5 do not depend on each other, and an 
instructor may do Chapter 5 first. 

Although I present the Jordan canonical form in Chapter 9, I usually 
do not have time to cover it during a one-semester course. I prefer to spend 
more time on topics discussed in Chapters 6 and 7 such as diagonalization 
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of normal and self-adjoint operators, polar and singular values decomposi- 
tion, the structure of orthogonal matrices and orientation, and the theory 
of quadratic forms. 


I feel that these topics are more important for applications than the 
Jordan canonical form, despite the definite beauty of the latter. However, I 
added Chapter 9 so the instructor may skip some of the topics in Chapters 
6 and 7 and present the Jordan Decomposition Theorem instead. 


I also included (new for 2009) Chapter 8, dealing with dual spaces and 
tensors. I feel that the material there, especially sections about tensors, is a 
bit too advanced for a first year linear algebra course, but some topics (for 
example, change of coordinates in the dual space) can be easily included in 
the syllabus. And it can be used as an introduction to tensors in a more 
advanced course. Note, that the results presented in this chapter are true 
for an arbitrary field. 


I had tried to present the material in the book rather informally, prefer- 
ring intuitive geometric reasoning to formal algebraic manipulations, so to 
a purist the book may seem not sufficiently rigorous. Throughout the book 
I usually (when it does not lead to the confusion) identify a linear transfor- 
mation and its matrix. This allows for a simpler notation, and I feel that 
overemphasizing the difference between a transformation and its matrix may 
confuse an inexperienced student. Only when the difference is crucial, for 
example when analyzing how the matrix of a transformation changes under 
the change of the basis, I use a special notation to distinguish between a 
transformation and its matrix. 
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Chapter 1 


Basic Notions 


1. Vector spaces 


A vector space V is a collection of objects, called vectors (denoted in this 
book by lowercase bold letters, like v), along with two operations, addition 
of vectors and multiplication by a number (scalar) | , such that the following 
8 properties (the so-called axioms of a vector space) hold: 


The first 4 properties deal with the addition: 


1. Commutativity: v-+w=w-+v forallv,w eV; 


2. Associativity: (u+v) + w =u+(v+w) for all u,v,w eV; 


3. Zero vector: there exists a special vector, denoted by O such that 
v+0=vforallveV; 


4. Additive inverse: For every vector v € V there exists a vector w € V 
such that v + w = 0. Such additive inverse is usually denoted as 
—Vv; 


The next two properties concern multiplication: 


5. Multiplicative identity: 1v = v for allv € V; 


lWe need some visual distinction between vectors and other objects, so in this book we use 
bold lowercase letters for vectors and regular lowercase letters for numbers (scalars). In some (more 
advanced) books Latin letters are reserved for vectors, while Greek letters are used for scalars; in 
even more advanced texts any letter can be used for anything and the reader must understand 
from the context what each symbol means. I think it is helpful, especially for a beginner to have 
some visual distinction between different objects, so a bold lowercase letters will always denote a 
vector. And on a blackboard an arrow (like in @) is used to identify a vector. 
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A question arises, 
“How one can mem- 
orize the above prop- 
erties?” And the an- 
swer is that one does 
not need to, see be- 
low! 


If you do not know 
what a field is, do 
not worry, since in 
this book we con- 
sider only the case 
of real and complex 
spaces. 
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6. Multiplicative associativity: (a8)v = a(Sv) for all v € V and all 
scalars a, (; 
And finally, two distributive properties, which connect multipli- 
cation and addition: 


7. a(u+v) =au+av for all u,v € V and all scalars a; 


8. (a+ 8)v = av + Bv for all v € V and all scalars a, 6. 


Remark. The above properties seem hard to memorize, but it is not nec- 
essary. They are simply the familiar rules of algebraic manipulations with 
numbers, that you know from high school. The only new twist here is that 
you have to understand what operations you can apply to what objects. You 
can add vectors, and you can multiply a vector by a number (scalar). Of 
course, you can do with number all possible manipulations that you have 
learned before. But, you cannot multiply two vectors, or add a number to 
a vector. 


Remark. It is easy to prove that zero vector 0 is unique, and that given 
v € V its additive inverse —v is also unique. 


It is also not hard to show using properties 5, 6 and 8 that 0 = Ov for 
any v € V, and that —v = (—1)v. Note, that to do this one still needs to 
use other properties of a vector space in the proofs, in particular properties 
3 and 4. 


If the scalars are the usual real numbers, we call the space V a real 
vector space. If the scalars are the complex numbers, i.e. if we can multiply 
vectors by complex numbers, we call the space V a complex vector space. 


Note, that any complex vector space is a real vector space as well (if we 
can multiply by complex numbers, we can multiply by real numbers), but 
not the other way around. 


It is also possible to consider a situation when the scalars are elements of 
an arbitrary field F. In this case we say that V is a vector space over the field 
F. Although many of the constructions in the book (in particular, everything 
in Chapters 1-3) work for general fields, in this text we are considering only 
real and complex vector spaces. 


If we do not specify the set of scalars, or use a letter F for it, then the 
results are true for both real and complex spaces. If we need to distinguish 
real and complex cases, we will explicitly say which case we are considering. 


Note, that in the definition of a vector space over an arbitrary field, we 
require the set of scalars to be a field, so we can always divide (without a 
remainder) by a non-zero scalar. Thus, it is possible to consider vector space 
over rationals, but not over the integers. 
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1.1. Examples. 


Example. The space R” consists of all columns of size n, 


UI 
U2 


< 
lI 


Un 


whose entries are real numbers. Addition and multiplication are defined 
entrywise, 1.e. 


U1 QV, U1 Wi UtT Wi 

v2 ava v2 W2 V2 + W2 
Qa = ; + = 

Ux QAUn Un Wn Un + Wn 


Example. The space C” also consists of columns of size n, only the entries 
now are complex numbers. Addition and multiplication are defined exactly 
as in the case of R”, the only difference is that we can now multiply vectors 
by complex numbers, i.e. C” is a complex vector space. 


Many results in this text are true for both R” and C”. In such cases we 
will use notation F”. 


Example. The space Mm xn (also denoted as Mm») of mx n matrices: the 
addition and multiplication by scalars are defined entrywise. If we allow 
only real entries (and so only multiplication only by reals), then we have a 
real vector space; if we allow complex entries and multiplication by complex 
numbers, we then have a complex vector space. 


Formally, we have to distinguish between between real and complex 
cases, i.e. write something like Me or Mee However, in most situa- 
tions there is no difference between real and complex case, and there is no 
need to specify which case we are considering. If there is a difference we say 
explicitly which case we are considering. 


Remark. As we mentioned above, the axioms of a vector space are just the 
familiar rules of algebraic manipulations with (real or complex) numbers, 
so if we put scalars (numbers) for the vectors, all axioms will be satisfied. 
Thus, the set R of real numbers is a real vector space, and the set C of 
complex numbers is a complex vector space. 


More importantly, since in the above examples all vector operations 
(addition and multiplication by a scalar) are performed entrywise, for these 
examples the axioms of a vector space are automatically satisfied because 
they are satisfied for scalars (can you see why?). So, we do not have to 
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check the axioms, we get the fact that the above examples are indeed vector 
spaces for free! 

The same can be applied to the next example, the coefficients of the 
polynomials play the role of entries there. 


Example. The space P,, of polynomials of degree at most n, consists of all 
polynomials p of form 


p(t) = a9 + ait + agt? +... + ant”, 


where t is the independent variable. Note, that some, or even all, coefficients 
ay, can be 0. 


In the case of real coefficients a, we have a real vector space, complex 
coefficient give us a complex vector space. Again, we will specify whether we 
treating real or complex case only when it is essential; otherwise everything 
applies to both cases. 


Question: What are zero vectors in each of the above examples? 


1.2. Matrix notation. An m x n matrix is a rectangular array with m 
rows and n columns. Elements of the array are called entries of the matrix. 

It is often convenient to denote matrix entries by indexed letters: the 
first index denotes the number of the row, where the entry is, and the second 
one is the number of the column. For example 


a41 a4,2 asa Ain 
a2. a22 oer Q2.n 
= ym, n = ; > 5 
ee) A= (45,6) j-1, k=1 — : : 
Am1l Am2 +++ Amn 


is a general way to write an m x n matrix. 

Very often for a matrix A the entry in row number j and column number 
k is denoted by Aj, or (A)j,,, and sometimes as in example (1.1) above the 
same letter but in lowercase is used for the matrix entries. 


Given a matrix A, its transpose (or transposed matrix) A’, is defined 
by transforming the rows of A into the columns. For example 


123\2 14 
45 ep) —1 23 
3 6 


So, the columns of A? are the rows of A and vice versa, the rows of A’ are 
the columns of A. 


The formal definition is as follows: (A‘);, = (A), meaning that the 


entry of A” in the row number j and column number k equals the entry of 
A in the row number k and column number j. 
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The transpose of a matrix has a very nice interpretation in terms of 
linear transformations, namely it gives the so-called adjoint transformation. 
We will study this in detail later, but for now transposition will be just a 
useful formal operation. 

One of the first uses of the transpose is that we can write a column 
vector x € F” (recall that F is R or C) as x = (#1, 22,...,a@n)?. If we put 
the column vertically, it will use significantly more space. 


Exercises. 

1.1. Let x = (1,2,3)", y = (y1, yo, ys)", z = (4,2,1)7. Compute 2x, 3y, x + 2y — 

3Z. 

1.2. Which of the following sets (with natural addition and multiplication by a 

scalar) are vector spaces. Justify your answer. 

a) The set of all continuous functions on the interval [0, 1]; 

b) The set of all non-negative functions on the interval (0, 1]; 

c) The set of all polynomials of degree exactly n; 
) 


d) The set of all symmetric n x n matrices, i.e. the set of matrices A = 


{je 37,21 Such that ATH=A. 
1.3. True or false: 
a) Every vector space contains a zero vector; 
b) 
c) An m x n matrix has m rows and n columns; 
d) 


A vector space can have more than one zero vector; 


If f and g are polynomials of degree n, then f + g is also a polynomial of 
degree n; 


e) If f and g are polynomials of degree at most n, then f + g is also a 
polynomial of degree at most n 


1.4. Prove that a zero vector 0 of a vector space V is unique. 

1.5. What matrix is the zero vector of the space Mo,.3? 

1.6. Prove that the additive inverse, defined in Axiom 4 of a vector space is unique. 
1.7. Prove that Ov = 0 for any vector v € V. 


1.8. Prove that for any vector v its additive inverse —v is given by (—1)v. 
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2. Linear combinations, bases. 


Let V be a vector space, and let v1, v2,...,Vp € V be a collection of vectors. 
A linear combination of vectors v1, V2,...,Vp is a sum of form 
P 
Q{Vy + AQV2 +... + ApVp = ) AVR: 
k=1 


Definition 2.1. A system of vectors v1, V2,...Vn € V is called a basis (for 
the vector space V) if any vector v € V admits a unique representation as 
a linear combination 


n 
V=a1v, + Qeveat+...+AnVn = ) AVR. 
k=1 


The coefficients a1, @2,...,Q,, are called coordinates of the vector v (in the 
basis, or with respect to the basis v1, v2,...,Vn)- 


Another way to say that v1,v2,...,Vn is a basis is to say that for any 
possible choice of the right side v, the equation 71v,+%oVo+...+2%mVn = V 
(with unknowns x,) has a unique solution. 


Before discussing any properties of bases”, let us give a few examples, 
showing that such objects exist, and that it makes sense to study them. 


Example 2.2. In the first example the space V is F”, where F is either R 
or C. Consider vectors 


1 0 0 0 
0 1 0 0 
a= 0 » €2= 0 > &3= 1 peeey En = 0 ’ 
0 0 0 1 
(the vector e, has all entries 0 except the entry number k, which is 1). The 
system of vectors e€1,€2,...,@n is a basis in F”. Indeed, any vector 
Ly 
v= - eF 
In 


can be represented as the linear combination 


n 
v=7e, + T2€9 +... Lnen = ) LEek 
k=1 


2the plural for the “basis” is bases, the same as the plural for “base” 
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and this representation is unique. The system e1,€2,...,@n, € F” is called 
the standard basis in F” 


Example 2.3. In this example the space is the space P,, of the polynomials 


of degree at most n. Consider vectors (polynomials) eg, e1, €2,...,@n € Pn 
defined by 
e:=l, ep:=t, eg:=t?, eg:=t?, ..., en :=t”. 
Clearly, any polynomial p, p(t) = a9 +a t+agt? +...+ant” admits a unique 
representation 
p=aoeo + ajey +... + Ann. 
So the system eo, €1,€2,...,€n € Pp is a basis in P,. We will call it the 


standard basis in P,,. 


Remark 2.4. If a vector space V has a basis v1, v2,...,Wn, then any vector 
v is uniquely defined by its coefficients in the decomposition v = )7y_) QKVk- 
So, if we stack the coefficients a, in a column, we can operate with them 
as if they were column vectors, i.e. as with elements of F” (again here F is 
either R or C, but everything also works for an abstract field F). 
Namely, if v = )\p—) QrVe and w = >>p_) Bev, then 
n n n 
vtw=) agve+ >> Bev = oan + Be) Ves 
k=1 k=1 k=1 

i.e. to get the column of coordinates of the sum one just need to add the 
columns of coordinates of the summands. Similarly, to get the coordinates 
of av we need simply to multiply the column of coordinates of v by a. 


2.1. Generating and linearly independent systems. The definition 
of a basis says that any vector admits a unique representation as a linear 
combination. This statement is in fact two statements, namely that the rep- 
resentation exists and that it is unique. Let us analyze these two statements 
separately. 

If we only consider the existence we get the following notion 
Definition 2.5. A system of vectors v1, V2,...,Vp € V is called a generating 


system (also a spanning system, or a complete system) in V if any vector 
v € V admits representation as a linear combination 


P 
V = QV, + A2V2 +... + ApVp = y QkVk. 
k=1 


The only difference from the definition of a basis is that we do not assume 
that the representation above is unique. 


This is a very im- 
portant remark, that 
will be used through- 
out the book. It al- 
lows us to translate 
any statement about 
the standard column 
space F” to a vector 
space V with a basis 
V1, V2,---,;Vn 
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The words generating, spanning and complete here are synonyms. I per- 
sonally prefer the term complete, because of my operator theory background. 
Generating and spanning are more often used in linear algebra textbooks. 


Clearly, any basis is a generating (complete) system. Also, if we have a 


basis, say V1, V2,...,Vn, and we add to it several vectors, say Vn+41,---, Vp; 
then the new system will be a generating (complete) system. Indeed, we can 
represent any vector as a linear combination of the vectors v1, V2,...,Vn, 


and just ignore the new ones (by putting corresponding coefficients a, = 0). 


Now, let us turn our attention to the uniqueness. We do not want to 
worry about existence, so let us consider the zero vector 0, which always 
admits a representation as a linear combination. 


Definition. A linear combination ajvj + a2v2+...+ QpVp is called trivial 
if a, =0 Vk. 


A trivial linear combination is always (for all choices of vectors 
V1, V2,-..,Vp) equal to 0, and that is probably the reason for the name. 


Definition. A system of vectors v1, v2,...,Vp € V is called linearly inde- 
pendent if only the trivial linear combination (S7?_1 axVp with a, = 0 Vk) 
of vectors V1, V2,.--,Vp equals 0. 

In other words, the system vj, V2,...,Vp is linearly independent iff the 
equation 71v1 + ®2Vv2 .+ ZpVp = 0 (with unknowns z,;) has only trivial 
solution 7] = 2% =...= 2p = 0. 


If a system is not linearly independent, it is called linearly dependent. 
By negating the definition of linear independence, we get the following 


Definition. A system of vectors v1,V2,...,Vp is called linearly dependent 
if O can be represented as a nontrivial linear combination, 0 = )~P_, OkVe- 
Non-trivial here means that at least one of the coefficient a, is non-zero. 
This can be (and usually is) written as )7?_, |ox| 4 0. 

So, restating the definition we can say, that a system is linearly depen- 
dent if and only if there exist scalars a1,02,...,Qp, >, |ax| # 0 such 


that 
Pp 
S> AkVE = 0. 
k=1 


An alternative definition (in terms of equations) is that a system vi, 
V2,...,Vp is linearly dependent iff the equation 


U1V, + %oevVe+...4 LpVp = 0 


(with unknowns z,) has a non-trivial solution. Non-trivial, once again 
means that at least one of x, is different from 0, and it can be written 


as hai |te| 4 0. 
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The following proposition gives an alternative description of linearly de- 
pendent systems. 


Proposition 2.6. A system of vectors v,V2,...,Vp € V is linearly de- 
pendent if and only if one of the vectors vz can be represented as a linear 
combination of the other vectors, 


p 
(2.1) VE= Sve 

j=l 

J#k 
Proof. Suppose the system vj, V2,..., Vp is linearly dependent. Then there 


exist scalars ap, ) P_, |ax| # 0 such that 


ayv, + aQeve+...4 QpVp = O. 


Let & be the index such that a, 4 0. Then, moving all terms except agvz 
to the right side we get 


P 
AkVE = — So ajv;. 
j=l 
Ak 
Dividing both sides by ag we get (2.1) with 8; = —a;/ag. 


On the other hand, if (2.1) holds, 0 can be represented as a non-trivial 
linear combination 


Obviously, any basis is a linearly independent system. Indeed, if a system 


V1, V2,-.-,;Vn is a basis, 0 admits a unique representation 
n 
0 = ayvi + Q2V2 +...+ AnVn = ) QEVE.- 
k=1 


Since the trivial linear combination always gives 0, the trivial linear combi- 
nation must be the only one giving 0. 

So, as we already discussed, if a system is a basis it is a complete (gen- 
erating) and linearly independent system. The following proposition shows 
that the converse implication is also true. 


Proposition 2.7. A system of vectors v1,V2,...,Wn € V is a basis if and 
only if it is linearly independent and complete (generating). 


In many textbooks 
a basis is defined 
as a complete and 
linearly independent 
system. By Propo- 
sition 2.7 this defini- 
tion is equivalent to 
ours. 
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Proof. We already know that a basis is always linearly independent and 
complete, so in one direction the proposition is already proved. 


Let us prove the other direction. Suppose a system vj, V2,..., Vn is lin- 
early independent and complete. Take an arbitrary vector v € V. Since the 
system V1, V2,.-., Vn is linearly complete (generating), v can be represented 
as 


nm 
V=aivi + aeovea+..-+ QnVn = ) QRVk. 
k=1 


We only need to show that this representation is unique. 


Suppose v admits another representation 


n 
v= ) QV: 
k=1 


Then 
n n n 
So (ar = Ak) Vk = So aKve = So akve =v-v=0. 
k=1 k=1 k=1 
Since the system is linearly independent, a, — A, = 0 Vk, and thus the 
representation v = Q1Vv1 + QqV2+...+QnVn is unique. 


Remark. In many textbooks a basis is defined as a complete and linearly 
independent system (by Proposition 2.7 this definition is equivalent to ours). 
Although this definition is more common than one presented in this text, I 
prefer the latter. It emphasizes the main property of a basis, namely that 
any vector admits a unique representation as a linear combination. 


Proposition 2.8. Any (finite) generating system contains a basis. 


Proof. Suppose vj, V2,...,Vp € V is a generating (complete) set. If it is 
linearly independent, it is a basis, and we are done. 


Suppose it is not linearly independent, i.e. it is linearly dependent. Then 
there exists a vector vz which can be represented as a linear combination of 
the vectors vj, j # k. 

Since v;, can be represented as a linear combination of vectors vj, 7 # k, 
any linear combination of vectors v1, V2, ..., Vp can be represented as a linear 
combination of the same vectors without vz (i.e. the vectors v;, 1 <j <p, 
j #k). So, if we delete the vector vj, the new system will still be a complete 
one. 

If the new system is linearly independent, we are done. If not, we repeat 
the procedure. 

Repeating this procedure finitely many times we arrive to a linearly 
independent and complete system, because otherwise we delete all vectors 
and end up with an empty set. 
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So, any finite complete (generating) set contains a complete linearly 
independent subset, i.e. a basis. 


Exercises. 
2.1. Find a basis in the space of 3 x 2 matrices M3 x2. 
2.2. True or false: 


a) Any set containing a zero vector is linearly dependent 
b) A basis must contain 0; 
c) subsets of linearly dependent sets are linearly dependent; 


d) subsets of linearly independent sets are linearly independent; 


e) If ayvy + agv2 +... + AnVn = 0 then all scalars a, are zero; 


2.3. Recall, that a matrix is called symmetric if AT = A. Write down a basis in the 
space of symmetric 2 x 2 matrices (there are many possible answers). How many 
elements are in the basis? 


2.4. Write down a basis for the space of 


a) 3x 3 symmetric matrices; 


b) n x n symmetric matrices; 


c) nxn antisymmetric (AT = —A) matrices; 
2.5. Let a system of vectors v1, Vv2,...,Vv, be linearly independent but not gen- 
erating. Show that it is possible to find a vector v,+; such that the system 
V1, V2,---,Vr,Vr41 is linearly independent. Hint: Take for v,41 any vector that 
cannot be represented as a linear combination )>,_, ag Vx and show that the system 
V1, V2,---,Vr,Vr41 is linearly independent. 


2.6. Is it possible that vectors v1, v2, v3 are linearly dependent, but the vectors 
wi =Vvit Vo, We = vo 4+ v3 and w3 = v3 + vy are linearly independent? 


The words “trans- 


formation”, “trans- 
form”, “mapping”, 
“map”, “operator”, 


“function” all denote 
the same object. 
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3. Linear Transformations. Matrix—vector multiplication 
A transformation T from a set X to aset Y is a rule that for each argument 
(input) 2 € X assigns a value (output) y= T(x) EY. 


The set X is called the domain of T, and the set Y is called the target 
space or codomain of T. 


We write T : X — Y to say that T is a transformation with the domain 
X and the target space Y. 


Definition. Let V, W be vector spaces (over the same field F). A transfor- 
mation T’: V + W is called linear if 


1. T(ut+ v) =T(u)+T(v) Vuyv eV; 
2. T(av) = aT(v) for all v € V and for all scalars a € F. 

Properties 1 and 2 together are equivalent to the following one: 
T(au+ Bv) = aT(u) + BT(v) for allu,v © Vand for all scalars a, £. 
3.1. Examples. You dealt with linear transformation before, may be with- 
out even suspecting it, as the examples below show. 


Example. Differentiation: Let V = P, (the set of polynomials of degree at 
most n), W = P,-1, and let T: P, + P,_1 be the differentiation operator, 


T(p) :=p’ Vp € Pn. 


Since (f +g)’ = f’ +49! and (af)’ =a’, this is a linear transformation. 


Example. Rotation: in this example V = W = R? (the usual coordinate 
plane), and a transformation T) : R? > R? takes a vector in R? and rotates 
it counterclockwise by y radians. Since T rotates the plane as a whole, 
it rotates as a whole the parallelogram used to define a sum of two vectors 
(parallelogram law). Therefore the property 1 of linear transformation holds. 
It is also easy to see that the property 2 is also true. 


Example. Reflection: in this example again V = W = R?, and the trans- 
formation T : R? + R? is the reflection in the first coordinate axis, see the 
fig. It can also be shown geometrically, that this transformation is linear, 
but we will use another way to show that. 


Namely, it is easy to write a formula for T, 


et) 


and from this formula it is easy to check that the transformation is linear. 
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Figure 1. Rotation 


Example. Let us investigate linear transformations T : R > R. Any such 


transformation is given by the formula 


T(x) =ax 


Indeed, 


where a = T(1). 


T(x) =T(a@ x 1) = 2T(1) = 2a = az. 


So, any linear transformation of | 


3.2. Linear transformations 


F* = FP, 


R is just a multiplication by a constant. 


Matrix—column multiplica- 


tion. It turns out that a linear transformation T : F” > F™ also can be 
represented as a multiplication, not by a scalar, but by a matrix. 


Let us see how. Let T : F” > F” be a linear transformation. What 
information do we need to compute T(x) for all vectors x € F"? My claim is 
that it is sufficient to know how T' acts on the standard basis e1, €2,...,€n 
of F”. Namely, it is sufficient to know n vectors in F™ (i.e. the vectors of 


size m), 
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Indeed, let 
Ly 
x2 
x= ‘ 
In 
Then x = xe] + 12@2 +... + Gnen = op) TeeR and 
n n n n 
T(x) =T(>_ eer) = 5) T(axen) = >> teT (er) = >> tear. 
k=1 k=1 k=1 k=1 
So, if we join the vectors (columns) aj, a2,...,a, together in a matrix 


A = [a1,a2,...,an] (ag being the Ath column of A, k = 1,2,...,n), this 
matrix contains all the information about T. 

Let us show how one should define the product of a matrix and a vector 
(column) to represent the transformation T as a product, T(x) = Ax. Let 


a1,1 G12 +++ Glin 

a2,1 a2.2 .+. Qn 
A= ; 

Am Am2 +++ Amn 


Recall, that the column number k of A is the vector ag, i.e. 


a1,k 
A2,k 
ap = : 
Qm,k 
Then if we want Ax = T(x) we get 
ai a1,2 ain 
me a21 a2,2 A2.n 
Ax =) Leap = Ly ; +29 : Pe oe 
k=1 : : 

Am,1 Am,2 Amn 


So, the matrix-vector multiplication should be performed by the follow- 
ing column by coordinate rule: 


multiply each column of the matrix by the corresponding coordi- 
nate of the vector. 


Example. 


Comat) Ga): 


3. Linear Transformations. Matrix—vector multiplication 15 


The “column by coordinate” rule is very well adapted for parallel com- 
puting. It will be also very important in different theoretical constructions 
later. 


However, when doing computations manually, it is more convenient to 
compute the result one entry at a time. This can be expressed as the fol- 
lowing row by column rule: 


To get the entry number & of the result, one need to multiply row 

number & of the matrix by the vector, that is, if Ax = y, then 
n 

Uk = Dija1 Oe, tj, k= 1,2,...m: 


here x; and y;, are coordinates of the vectors x and y respectively, and a; 
are the entries of the matrix A. 


Example. 
1 2 3 : _ f 1-142-243-3)\_ / 14 
4 5 6 3 ~ \ 4-145-24+6-3 ]/ ~~ \ 32 


3.3. Linear transformations and generating sets. As we discussed 
above, linear transformation T (acting from F” to F™) is completely defined 
by its values on the standard basis in F”. 

The fact that we consider the standard basis is not essential, one can 
consider any basis, even any generating (spanning) set. Namely, 


A linear transformation T : V + W is completely defined by its 
values on a generating set (in particular by its values on a basis). 


So, if v1, V2,...,Vn is a generating set (in particular, if it is a basis) in V, 
and T and 7} are linear transformations 7,7, :V — W such that 


Tvp = Tivp, sea We ees 
then T = Ti. 
The proof of this statement is trivial and left as an exercise. 
3.4. Conclusions. 


e To get the matrix of a linear transformation T : F” — F™ one needs 


to join the vectors a, = Te, (where e1,€2,...,€n is the standard 
basis in F”) into a matrix: kth column of the matrix is ag, k = 
i pee 


e If the matrix A of the linear transformation T is known, then T(x) 
can be found by the matrix-vector multiplication, T(x) = Ax. To 
perform matrix—vector multiplication one can use either “column by 
coordinate” or “row by column” rule. 


The notation Tv is 
often used instead of 
T(v). 


In the matrix vector 
multiplication using 
the “row by column” 
rule be sure that you 
have the same num- 
ber of entries in the 
row and in the col- 
umn. The entries 
in the row and in 
the column should 
end simultaneously: 
if not, the multipli- 
cation is not defined. 
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The latter seems more appropriate for manual computations. 
The former is well adapted for parallel computers, and will be used 
in different theoretical constructions. 


For a linear transformation T : F" — F”, its matrix is usually denoted 
as [T]. However, very often people do not distinguish between a linear trans- 
formation and its matrix, and use the same symbol for both. When it does 
not lead to confusion, we will also use the same symbol for a transformation 
and its matrix. 


Since a linear transformation is essentially a multiplication, the notation 
Tv is often used instead of T(v). We will also use this notation. Note that 
the usual order of algebraic operations apply, i.e. Tv + u means T(v) + u, 
not T(v + u). 


Remark. In the matrix-vector multiplication Ax the number of columns 
of the matrix A matrix must coincide with the size of the vector x, i.e. a 
vector in F” can only be multiplied by an m x n matrix. 


It makes sense, since an m xX n matrix defines a linear transformation 
? 
F” + F, so vector x must belong to F”. 


The easiest way to remember this is to remember that if performing 
multiplication you run out of some elements faster, then the multiplication 
is not defined. For example, if using the “row by column” rule you run 
out of row entries, but still have some unused entries in the vector, the 
multiplication is not defined. It is also not defined if you run out of vector’s 
entries, but still have unused entries in the row. 


Remark. One does not have to restrict himself to the case of F” with 
standard basis: everything described in this section works for transformation 
between arbitrary vector spaces as long as there is a basis in the domain and 
in the target space. Of course, if one changes a basis, the matrix of the linear 
transformation will be different. This will be discussed later in Section 8. 


Exercises. 
3.1. Multiply: 
» (423) : 
4 5 6 9 
1 2 
b) { 0 1 es. 
2 0 
12 0 0 1 
0 1 2 0 2 
7/0012 3.3 
000 1 4 


4. Linear transformations as a vector space 17 


d) 


ooor 
coorw~w 
Orne 
EWN re 


3.2. Let a linear transformation in R? be the reflection in the line x; = x. Find 
its matrix. 


3.3. For each linear transformation below find it matrix 


a) T: R? > R® defined by T(x, y)? = (x + 2y, 2a — 5y, Ty)7; 
b) T: R* > R? defined by T(a#1, 22,23, 04)" = (a1 +ao+234+2%4, 22-24, 01+ 
3x9 + 6a4)7; 


c) T:P, > Pn, Tf(t) = f’(t) (find the matrix with respect to the standard 
basis 1, ¢,t?,...,¢”); 


d) T:P, > Ph, Tf(t) = 2f(t) + 3f’(t) — 4f"(t) (again with respect to the 
standard basis 1,¢,t?,...,¢”). 


3.4. Find 3 x 3 matrices representing the transformations of R*® which: 


a) project every vector onto 2-y plane; 
b) reflect every vector through 2-y plane; 
c) rotate the z-y plane through 30°, leaving z-axis alone. 
3.5. Let A be a linear transformation. If z is the center of the straight interval 


[x,y], show that Az is the center of the interval [Ax, Ay]. Hint: What does it 
mean that z is the center of the interval [x, y]? 


3.6. The set C of complex numbers can be canonically identified with the space R? 
by treating each z = a + iy € C as a column (2, y)7 € R?. 


a) Treating C as a complex vector space, show that the multiplication by 
a=a-+ibe€ Cis a linear transformation in C. What is its matrix? 


b) Treating C as the real vector space R? show that the multiplication by 
a =a-+ib defines a linear transformation there. What is its matrix? 


c) Define T(a+iy) = 2x —y+i(a—3y). Show that this transformation is not 
a linear transformation in the complex vectors space C, but if we treat C 
as the real vector space R? then it is a linear transformation there (i.e. that 
T is a real linear but not a complex linear transformation). 

Find the matrix of the real linear transformation T. 


3.7. Show that any linear transformation in C (treated as a complex vector space) 
is a multiplication by a € C. 
4. Linear transformations as a vector space 


What operations can we perform with linear transformations? We can al- 
ways multiply a linear transformation for a scalar, i.e. if we have a linear 
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transformation T : V — W and a scalar a we can define a new transforma- 
tion aT by 
(aT)v = a(Tv) Ye. 
It is easy to check that aT is also a linear transformation: 
(aT)(a1vi + a2v2) = a(T(aivi + a2v2)) by the definition of aT’ 
= a(aiTvi + a2gT v2) by the linearity of T 


= ayjaTv, + agaT v2 = a4 (aT)v1 + a2(aT)ve 


If J; and T> are linear transformations with the same domain and target 
space (Ty : V — W and T2 : V > W, or in short T%),72 : V — W), 
then we can add these transformations, i.e. define a new transformation 


T=(%1+72):V > W by 
(T, + T2)v = Tiv + Tov VYWelv. 


It is easy to check that the transformation T; + T is a linear one, one just 
needs to repeat the above reasoning for the linearity of aT. 


So, if we fix vector spaces V and W and consider the collection of all 
linear transformations from V to W (let us denote it by C£(V,W)), we can 
define 2 operations on £(V,W): multiplication by a scalar and addition. 
It can be easily shown that these operations satisfy the axioms of a vector 
space, defined in Section 1. 


This should come as no surprise for the reader, since axioms of a vector 
space essentially mean that operation on vectors follow standard rules of 
algebra. And the operations on linear transformations are defined as to 
satisfy these rules! 


As an illustration, let us write down a formal proof of the first distribu- 
tive law (axiom 7) of a vector space. We want to show that a(T, + Tz) = 
aT, + aT. For any v € V 


a(T, + Th)v = a((T; + T))v) by the definition of multiplication 


= a(Tiv + Thv) by the definition of the sum 
=aTiv+aThv by Axiom 7 for W 
= (aT, + aT2)v by the definition of the sum 


So indeed a(T, + T2) =aTl, + aT. 


Remark. Linear operations (addition and multiplication by a scalar) on 
linear transformations T : F” — F™ correspond to the respective operations 
on their matrices. Since we know that the set of m x n matrices is a vector 
space, this immediately implies that £(F",F”) is a vector space. 

We presented the abstract proof above, first of all because it work for 
general spaces, for example, for spaces without a basis, where we cannot 
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work with coordinates. Secondly, the reasonings similar to the abstract one 
presented here, are used in many places, so the reader will benefit from 
understanding it. 


And as the reader gains some mathematical sophistication, he/she will 
see that this abstract reasoning is indeed a very simple one, that can be 
performed almost automatically. 


5. Composition of linear transformations and matrix 
multiplication. 


5.1. Definition of the matrix multiplication. Knowing matrix—vector 
multiplication, one can easily guess what is the natural way to define the 
product AB of two matrices: Let us multiply by A each column of B (matrix- 
vector multiplication) and join the resulting column-vectors into a matrix. 
Formally, 


if b;,b2,...,b, are the columns of B, then Ab;, Abo,..., Ab; are 
the columns of the matrix AB. 


Recalling the row by column rule for the matrix—vector multiplication we 
get the following row by column rule for the matrices 


the entry (AB);, (the entry in the row j and column k) of the 
product AB is defined by 


(AB);,. = (row #j of A) - (column #k of B) 


Formally it can be rewritten as 
(AB); = >> ajubie, 
1 


if a; and b;;, are entries of the matrices A and B respectively. 

I intentionally did not speak about sizes of the matrices A and B, but 
if we recall the row by column rule for the matrix—vector multiplication, we 
can see that in order for the multiplication to be defined, the size of a row 
of A should be equal to the size of a column of B. 


In other words the product AB is defined if and only if A is an mx n 
and B is n x r matrix. 


5.2. Motivation: composition of linear transformations. One can 
ask yourself here: Why are we using such a complicated rule of multiplica- 
tion? Why don’t we just multiply matrices entrywise? 


And the answer is, that the multiplication, as it is defined above, arises 
naturally from the composition of linear transformations. 


We will usually 
identify a linear 
transformation and 
its matrix, but in 
the next few 
paragraphs we will 
distinguish them 


Note: order of 
transformations! 
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Suppose we have two linear transformations, 7, : F”" — F™ and T» : 
F”" > F”. Define the composition T = T, o T2 of the transformations 7), T> 
as 

T(x) = T\(To(x)) Va € F’. 

Note that T>(x) € F”. Since T, : F” > F”™, the expression T\(T2(x)) is well 
defined and the result belongs to F™. So, T: F" > F”. 

It is easy to show that T is a linear transformation (exercise), so it is 
defined by an m x r matrix. How one can find this matrix, knowing the 
matrices of T; and T,? 


Let A be the matrix of T; and B be the matrix of Th. As we discussed in 
the previous section, the columns of T are vectors T(e;),T(e2),...,T(e,), 
where e1,€2,...,@, is the standard basis in F’. For k = 1,2,...,r we have 


T (ex) = Ti (T2(ex)) = Ti (Bex) = Ti (bx) = Ab, 
(operators T2 and T| are simply the multiplication by B and A respectively). 


So, the columns of the matrix of T are Abi, Abo,..., Ab,;, and that is 
exactly how the matrix AB was defined! 


Let us return to identifying again a linear transformation with its matrix. 
Since the matrix multiplication agrees with the composition, we can (and 
will) write T,T2 instead of T; 0 Tz and T; Tx instead of T;(T2(x)). 


Note that in the composition 7;7> the transformation T is applied first! 
The way to remember this is to see that in 7, T7>x the transformation T> 
meets x fist. 


Remark. There is another way of checking the dimensions of matrices in a 
product, different form the row by column rule: for a composition T\T> to 
be defined it is necessary that T>x belongs to the domain of 7). If T> acts 
from some space, say F” to F”, then 7; must act from F” to some space, say 
F”™. So, in order for T,7> to be defined the matrices of 7; and T2 should be 
of sizes m x n and n x r respectively—the same condition as obtained from 
the row by column rule. 


Example. Let T : R? > R? be the reflection in the line x} = 322. It is 
a linear transformation, so let us find its matrix. To find the matrix, we 
need to compute Te, and Teg. However, the direct computation of Te, and 
Tea involves significantly more trigonometry than a sane person is willing 
to remember. 

An easier way to find the matrix of T is to represent it as a composition 
of simple linear transformation. Namely, let 7 be the angle between the 
x1 axis and the line x, = 322, and let Jo be the reflection in the x-axis. 
Then to get the reflection T we can first rotate the plane by the angle —7, 
moving the line 7; = 322 to the x-axis, then reflect everything in the x, 


5. Composition of linear transformations and matrix multiplication. 21 


axis, and then rotate the plane by ¥, taking everything back. Formally it 
can be written as 


T = RyTpR_y 


(note the order of terms!), where R, is the rotation by y. The matrix of To 
is easy to compute, 
1 0 
To = ( 0 —-1 ) ’ 


the rotation matrices are known 
R= ( cos y —siny ) 
siny cosy, 
— cos(—y) —sin(-—7y) \ __ ( cosy — siny 
“7 \ sin(—y) cos(—-y), J \ —siny cosy, 


To compute siny and cosy take a vector in the line x1 = 32, say a vector 
(3,1)7. Then 


first coordinate 3 3 
cosy = = = 
length J3?2+12 V10 
and similarly 
. second coordinate 1 al 
siny = = 


length (a2 -</10 


Gathering everything together we get 
1 3 -1 1 0 1 3 41 
ta ROS Tyla e a cx) aol 5) 


Sli ol ost) 


It remains only to perform matrix multiplication here to get the final result. 


5.3. Properties of matrix multiplication. Matrix multiplication enjoys 
a lot of properties, familiar to us from high school algebra: 

1. Associativity: A(BC) = (AB)C, provided that either left or right 
side is well defined; we therefore can (and will) simply write ABC 
in this case. 

2. Distributivity: A(B + C) = AB + AC, (A+ B)C = AC + BC, 
provided either left or right side of each equation is well defined. 


3. One can take scalar multiplies out: A(aB) = (aA)B = a(AB) = 
aAB. 
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These properties are easy to prove. One should prove the corresponding 
properties for linear transformations, and they almost trivially follow from 
the definitions. The properties of linear transformations then imply the 
properties for the matrix multiplication. 


The new twist here is that the commutativity fails: 


matrix multiplication is non-commutative, i.e. generally for 


matrices AB 4 BA. 


One can see easily it would be unreasonable to expect the commutativity of 
matrix multiplication. Indeed, let A and B be matrices of sizes m x n and 
n x r respectively. Then the product AB is well defined, but ifm #r, BA 
is not defined. 

Even when both products are well defined, for example, when A and B 
are nxn (square) matrices, the multiplication is still non-commutative. If we 
just pick the matrices A and B at random, the chances are that AB # BA: 
we have to be very lucky to get AB = BA. 


5.4. Transposed matrices and multiplication. Given a matrix A, its 
transpose (or transposed matrix) A’ is defined by transforming the rows of 
A into the columns. For example 


14 
( bos y ee 
4 5 6 7 
3 6 
So, the columns of A? are the rows of A and vice versa, the rows of A’ are 
the columns of A. 

The formal definition is as follows: (A’);, = (A), meaning that the 
entry of A’ in the row number j and column number k equals the entry of 
A in the row number & and row number j. 

The transpose of a matrix has a very nice interpretation in terms of 
linear transformations, namely it gives the so-called adjoint transformation. 
We will study this in detail later, but for now transposition will be just a 
useful formal operation. 

One of the first uses of the transpose is that we can write a column 


vector x € F” as x = (#1,%2,...,%n)". If we put the column vertically, it 
will use significantly more space. 


A simple analysis of the row by columns rule shows that 
(AB)? = BTA, 


i.e. when you take the transpose of the product, you change the order of the 
terms. 
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5.5. Trace and matrix multiplication. For a square (n x n) matrix 
A = (aj) its trace (denoted by trace A) is the sum of the diagonal entries 


n 
trace A = > Qk,k- 
k=1 


Theorem 5.1. Let A and B be matrices of sizemxn andnxm respectively 
(so the both products AB and BA are well defined). Then 


trace(AB) = trace(BA) 


We leave the proof of this theorem as an exercise, see Problem 5.6 below. 
There are essentially two ways of proving this theorem. One is to compute 
the diagonal entries of AB and of BA and compare their sums. This method 
requires some proficiency in manipulating sums in )> notation. 

If you are not comfortable with algebraic manipulations, there is another 


way. We can consider two linear transformations, T and JT), acting from 
Mnxm to F = F! defined by 


T(X) = trace(AX), T,(X) = trace(X A) 
To prove the theorem it is sufficient to show that T = 71; the equality for 
X = B gives the theorem. 


Since a linear transformation is completely defined by its values on a 
generating system, we need just to check the equality on some simple ma- 
trices, for example on matrices Xj, which has all entries 0 except the entry 
1 in the intersection of jth column and kth row. 


Exercises. 


5.1. Let 
2 


1 2 10 2 i, = 2." 3 . 
an(}2)ae(193).ce(4 2 4). 22 (3 


a) Mark all the products that are defined, and give the dimensions of the 
result: AB, BA, ABC, ABD, BC, BC’, BTC, DC, D™CT™. 


b) Compute AB, A(3B + C), BTA, A(BD), (AB)D. 
5.2. Let T, be the matrix of rotation by y in R?. Check by matrix multiplication 
that TyT_y = T_T, =I 


5.3. Multiply two rotation matrices T, and Tg (it is a rare case when the multi- 
plication is commutative, i.e. TT = TT, so the order is not essential). Deduce 
formulas for sin(a + 8) and cos(a@ + 8) from here. 


5.4. Find the matrix of the orthogonal projection in R? onto the line 2; = —2a9. 
Hint: What is the matrix of the projection onto the coordinate axis 71? 


5.5. Find linear transformations A, B : R? + R? such that AB = 0 but BA 4 0. 


Often, the symbol 
is used in Linear Al- 
gebra textbooks for 
the identity matrix. 
I prefer J, since it is 
used in operator the- 
ory. 
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5.6. Prove Theorem 5.1, i.e. prove that trace(AB) = trace(BA). 
5.7. Construct a non-zero matrix A such that A? = 0. 


5.8. Find the matrix of the reflection through the line y = —22/3. Perform all the 
multiplications. 


6. Invertible transformations and matrices. Isomorphisms 


6.1. Identity transformation and identity matrix. Among all linear 
transformations, there is a special one, the identity transformation (opera- 
tor) I, Ix = x, Vx. 

To be precise, there are infinitely many identity transformations: for 
any vector space V, there is the identity transformation J = I yiV-y, 
Ix = x, Vx € V. However, when it is does not lead to the confusion 
we will use the same symbol J for all identity operators (transformations). 
We will use the notation [,, only we want to emphasize in what space the 
transformation is acting. 


Clearly, if J : F” + F” is the identity transformation in F”, its matrix 
is the n x n matrix 


te 
01 0 
=.= 
00... 1 


(1 on the main diagonal and 0 everywhere else). When we want to emphasize 
the size of the matrix, we use the notation I,; otherwise we just use I. 


Clearly, for an arbitrary linear transformation A, the equalities 
AI=A, IA=A 


hold (whenever the product is defined). 


6.2. Invertible transformations. 


Definition. Let A : V — W be a linear transformation. We say that 
the transformation A is left invertible if there exist a linear transformation 
B:W — V such that 


BA=I (I= 1, here). 


The transformation A is called right invertible if there exists a linear trans- 
formation C: W — V such that 


AC =I (here I = I,,). 
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The transformations B and C are called left and right inverses of A. Note, 
that we did not assume the uniqueness of B or C here, and generally left 
and right inverses are not unique. 


Definition. A linear transformation A: V — W is called invertible if it is 
both right and left invertible. 


Theorem 6.1. Jf a linear transformation A: V — W is invertible, then its 
left and right inverses B and C are unique and coincide. 


Corollary. A transformation A: V — W is invertible if and only if there 
exists a unique linear transformation (denoted A~!), A~! : W > V such 
that 
= = a 
AVA=L,, AAV =1,,. 
The transformation A~! is called the inverse of A. 


Proof of Theorem 6.1. Let BA = J and AC =I. Then 
BAC = B(AC) = BI=B. 

On the other hand 
BAC =(BA)C=IC=C, 

and therefore B=C. 


Suppose for some transformation B, we have Bj A = I. Repeating the 
above reasoning with By, instead of B we get B; = C. Therefore the left 
inverse B is unique. The uniqueness of C is proved similarly. 


Definition. A matrix is called invertible (resp. left invertible, right invert- 
ible) if the corresponding linear transformation is invertible (resp. left in- 
vertible, right invertible). 


Theorem 6.1 asserts that a matrix A is invertible if there exists a unique 
matrix A~! such that A~'A = I, AA~! = I. The matrix A7! is called 
(surprise) the inverse of A. 


Examples. 


1. The identity transformation (matrix) is invertible, I~! = J; 


2. The rotation R, 
_ [( cosy —siny 
i ( siny cosy ) 


is invertible, and the inverse is given by (Ry)~! = R_y. This equality 
is clear from the geometric description of R,, and it also can be 
checked by the matrix multiplication; 


Very often this prop- 
erty is used as the 
definition of an in- 
vertible transforma- 
tion 


An invertible matrix 
must be square (to 
be proved later) 


Inverse of a product: 
(AB)-! = B-tATt. 
Note the change of 
order 
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3. The column (1, 1)” is left invertible but not right invertible. One of 
the possible left inverses in the row (1/2, 1/2). 

To show that this matrix is not right invertible, we just notice 
that there are more than one left inverse. Exercise: describe all 
left inverses of this matrix. 

4. The row (1,1) is right invertible, but not left invertible. The column 
(1/2,1/2)" is a possible right inverse. 
Remark 6.2. An invertible matrix must be square (n x n). Moreover, if 
a square matrix A has either left or right inverse, it is invertible. So, it is 
sufficient to check only one of the identities AA~' = 7, A71A = I. 
This fact will be proved later. Until we prove this fact, we will not use 
it. I presented it here only to stop students from trying wrong directions. 


6.2.1. Properties of the inverse transformation. 


Theorem 6.3 (Inverse of the product). If linear transformations A and B 
are invertible (and such that the product AB is defined), then the product 
AB is invertible and 
(AB) '=B 1a 

(note the change of the order!) 
Proof. Direct computation shows: 

(AB)(B-1A7!) = A(BB“1)A7! = AIA"! = AA1 =I 
and similarly 

(B-1A71)(AB) = B1(A71A)B = BUIB=B"'B=I 


Remark 6.4. The invertibility of the product AB does not imply the in- 
vertibility of the factors A and B (can you think of an example?). However, 
if one of the factors (either A or B) and the product AB are invertible, then 
the second factor is also invertible. 


We leave the proof of this fact as an exercise. 
Theorem 6.5 (Inverse of A’). If a matrix A is invertible, then AT is also 
invertible and 
(AT) = (AM)P 
Proof. Using (AB)? = BT A? we get 
(AH aAT = (AA-“H? = IT =, 


and similarly 


AT(A7!)? = (A114)? =F =T. 
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And finally, if A is invertible, then A7! is also invertible, (A7')~! = A. 
So, let us summarize the main properties of the inverse: 
1. If A is invertible, then A7! is also invertible, (A~!)~! = A; 
2. If A and B are invertible and the product AB is defined, then AB 
is invertible and (AB)~! = B-!A7!. 
3. If A is invertible, then A” is also invertible and (A?)~! = (A71)?. 


6.3. Isomorphism. Isomorphic spaces. An invertible linear transfor- 
mation A: V > W is called an isomorphism. We did not introduce anything 
new here, it is just another name for the object we already studied. 

Two vector spaces V and W are called isomorphic (denoted V = W) if 
there is an isomorphism A: V > W. 

Isomorphic spaces can be considered as different representation of the 
same space, meaning that all properties and constructions involving vector 
space operations are preserved under isomorphism. 


The theorem below illustrates this statement. 


Theorem 6.6. Let A: V > W be an isomorphism, and let v1, v2,---,;Vn 
be a basis in V. Then the system Av,, Av2,...,AVn is a basis in W. 


We leave the proof of the theorem as an exercise. 


Remark. In the above theorem one can replace “basis” by “linearly inde- 
pendent”, or “generating”, or “linearly dependent” —all these properties are 
preserved under isomorphisms. 


Remark. If A is an isomorphism, then so is A~!. Therefore in the above 
theorem we can state that v,,v2,...,Vn is a basis if and only if Avy, Avo, 
..., AV», is a basis. 


The converse to the Theorem 6.6 is also true 


Theorem 6.7. Let A: V > W be a linear map,and let v1, v2,..-,Vn 
and W1,W2,..-,Wn be bases in V and W respectively. If Av, = wz, k = 
1,2,...,n, then A is an isomorphism. 

Proof. Define the inverse transformation A~! by A7!w, = vz, k = 1, 
2,...,n (as we know, a linear transformation is defined by its values on a 
basis). 

Examples. 


1. Let A: F"+! — PF (P¥ is the set of polynomials S7?_, axt®, ax € F 
of degree at most 7) is defined by 


Ae; =1, Aen =t,..., Ae, =t"1, Ae,+1 = t” 


Any real vector 
space with a basis 
is isomorphic to R” 
(for some n). Sim- 
ilarly, any complex 
vector space with a 
basis is isomorphic 
to C”. 


Doesn’t this remind 
you of a basis? 
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By Theorem 6.7 A is an isomorphism, so P* = F"+1, 


2. Let V be a vector space (over F) with a basis vj, va,... 
transformation A: F” + V by 


k=1,2,...,n, 


,Vn- Define 


Aex = Vk, 
where e1,€9,...,€p is the standard basis in F". Again by Theorem 
6.7 A is an isomorphism, so V = F”. 
3. The space M¥,, of 2 x 3 matrices with entries in F is isomorphic to 
R°; 


4. More generally, MF 


~ PMN 
mxn — F 


6.4. Invertibility and equations. 


Theorem 6.8. Let A: X — Y be a linear transformation. Then A is 


invertible if and only if for any right side b € Y the equation 
Ax=b 
has a unique solution x € X. 
Proof. Suppose A is invertible. Then x = A~'b solves the equation Ax = 


b. To show that the solution is unique, suppose that for some other vector 
x, EX 


Ax, =b 
Multiplying this identity by A7! from the left we get 
A-!Ax, = Av'h, 


and therefore x; = A~'b = x. Note that both identities, AA~! = I and 
A-!A=TI were used here. 

Let us now suppose that the equation Ax = b has a unique solution x 
for any b € Y. Let us use symbol y instead of b. We know that given y € Y 
the equation 

Ax=y 

has a unique solution x € X. Let us call this solution B(y). 

Note that B(y) is defined for all y € Y, so we defined a transformation 
B:Y +X. 

Let us check that B is a linear transformation. We need to show that 
B(ayi+By2) = aB(y1)+6B(y2). Let x, := Bly,), k = 1,2, ie. Axp = yp, 
k = 1,2. Then 


A(ax, + 8x2) = aAx; + BAX, = ayi + Bya, 


which means 
B(ayi + By2) = aB(yi) + BB(y2). 
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And finally, let us show that B is indeed the inverse of A. Take x € X 
and let y = Ax, so by the definition of B we have x = By. Then for all 
xeEx 

BAx = By =x, 
so BA =I. Similarly, for arbitrary y € Y let x = By, so y = Ax. Then for 
aly eY 

ABy = Ax=y 


soAB=T. 


Recalling the definition of a basis we get the following corollary of The- 
orems 6.6 and 6.7. 


Corollary 6.9. An mx n matrix is invertible if and only if its columns 
form a basis in F™. 


Exercises. 


6.1. Prove, that if A: V — W is an isomorphism (i.e. an invertible linear trans- 
formation) and vj, V2,...,;Vn is a basis in V, then Avi, Avo,..., AVn is a basis in 


WwW. 


6.2. Find all right inverses to the 1 x 2 matrix (row) A = (1,1). Conclude from 
here that the row A is not left invertible. 


6.3. Find all left inverses of the column (1, 2,3)" 
6.4. Is the column (1, 2,3)” right invertible? Justify 


6.5. Find two matrices A and B that AB is invertible, but A and B are not. Hint: 
square matrices A and B would not work. Remark: It is easy to construct such 
A and B in the case when AB is a 1 x 1 matrix (a scalar). But can you get 2 x 2 
matrix AB? 3 x 3? nx n? 


6.6. Suppose the product AB is invertible. Show that A is right invertible and B 
is left invertible. Hint: you can just write formulas for right and left inverses. 


6.7. Let A and AB be invertible (assuming that the product AB is well defined). 
Prove that B is invertible. 


6.8. Let A be n x n matrix. Prove that if A? = 0 then A is not invertible 
6.9. Suppose AB = 0 for some non-zero matrix B. Can A be invertible? Justify. 


6.10. Write matrices of the linear transformations T, and T> in F°, defined as 
follows: T, interchanges the coordinates x2 and x4 of the vector x, and T> just 
adds to the coordinate x2 a times the coordinate x4, and does not change other 
coordinates, i.e. 


Ly Ly Ly Ly 
%Q T4 Hi) t+ 4x4 

Ti] t |=] 2% |, T2| %3 | = x3 ; 
v4 Hip) v4 v4 
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here a is some fixed number. 


Show that T, and T> are invertible transformations, and write the matrices of 
the inverses. Hint: it may be simpler, if you first describe the inverse transforma- 
tion, and then find its matrix, rather than trying to guess (or compute) the inverses 
of the matrices T,, To. 


6.11. Find the matrix of the rotation in R® through the angle a around the vector 
(1,2,3)7. We assume that the rotation is counterclockwise if we sit at the tip of 
the vector and looking at the origin. 


You can present the answer as a product of several matrices: you don’t have 
to perform the multiplication. 


6.12. Give examples of matrices (say 2 x 2) such that: 
a) A+ B is not invertible although both A and B are invertible; 
b) A+ B is invertible although both A and B are not invertible; 
c) Allof A, B and A+ B are invertible 


6.13. Let A be an invertible symmetric (A? = A) matrix. Is the inverse of A 
symmetric? Justify. 


7. Subspaces. 


A subspace of a vector space V is a non-empty subset Vo C V of V which is 
closed under the vector addition and multiplication by scalars, i.e. 


1. If v € Vo then av € V for all scalars a; 
2. For any u,v € Vo the sum u+ v € VW; 


Again, the conditions 1 and 2 can be replaced by the following one: 


au+ bv € Vo for all u,v € Vo, and for all scalars a, 6. 


Note, that a subspace Vo C V with the operations (vector addition and 
multiplication by scalars) inherited from V, is a vector space. Indeed, since 
V is non-empty, it contain at least 1 vector v and since 0 = Ov, the above 
condition 1. imples that the zero vector 0 is in V. Also, for any v € V 
its additive inverse —v in given by —v = (—1)v, so again by property 1. 
—v €V. The rest of the axiom of the vector space are satisfied because all 
operations are inherited from the vector space V. The only thing that could 
possibly go wrong, is that the result of some operation does not belong to 
Vo. But the definition of a subspace prohibits this! 


Now let us consider some examples: 


1. Trivial subspaces of a space V, namely V itself and {0} (the sub- 
space consisting only of zero vector). Note, that the empty set @ is 
not a vector space, since it does not contain a zero vector, so it is 
not a subspace. 
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With each linear transformation A: V — W we can associate the following 
two subspaces: 


2. The null space, or kernel of A, which is denoted as Null A or Ker A 
and consists of all vectors v € V such that Av = 0 


3. The range Ran A is defined as the set of all vectors w € W which 
can be represented as w = Av for some v € V. 


If A is a matrix, ie. A: F™ > F”, then recalling column by coordinate rule 
of the matrix—vector multiplication, we can see that any vector w € Ran A 
can be represented as a linear combination of columns of the matrix A. That 
explains why the term column space (and notation Col.A) is often used for 
the range of the matrix. So, for a matrix A, the notation Col A is often used 
instead of Ran A. 


And now the last example. 


4. Given a system of vectors vj, V2,...,V, € V its linear span (some- 
times called simply span) L{v1,v2,...,Vv,} is the collection of all 
vectors v € V that can be represented as a linear combination 
Vv = ary, + agvg +... + @pv, of vectors V1, V2,...,Vr. The no- 
tation span{vj, V2,...,V,} is also used instead of L{v1, vo,...,vr} 


It is easy to check that in all of these examples we indeed have subspaces. 
We leave this an an exercise for the reader. Some of the statements will be 
proved later in the text. 


Exercises. 


7.1. Let X and Y be subspaces of a vector space V. Prove that XMY is a subspace 
of V. 


7.2. Let V be a vector space. For X,Y C V the sum X +Y is the collection of all 
vectors v which can be represented as v = x+y, x € X,y € Y. Show that if X 
and Y are subspaces of V, then X + Y is also a subspace. 


7.3. Let X be a subspace of a vector space V, and let v € V, v ¢ X. Prove that 
ifxe X thenx+v¢X. 


7.4. Let X and Y be subspaces of a vector space V. Using the previous exercise, 
show that X UY is a subspace if and only if X CY or Y CX. 


7.5. What is the smallest subspace of the space of 4 x 4 matrices which contains 
all upper triangular matrices (a;,, = 0 for all 7 > k), and all symmetric matrices 
(A = A’)? What is the largest subspace contained in both of those subspaces? 


8. Application to computer graphics. 


In this section we give some ideas of how linear algebra is used in computer 
graphics. We will not go into the details, but just explain some ideas. 
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In particular we explain why manipulation with 3 dimensional images are 
reduced to multiplications of 4 x 4 matrices. 


8.1. 2-dimensional manipulation. The z-y plane (more precisely, a rec- 
tangle there) is a good model of a computer monitor. Any object on a 
monitor is represented as a collection of pixels, each pixel is assigned a spe- 
cific color. Position of each pixel is determined by the column and row, 
which play role of x and y coordinates on the plane. So a rectangle on a 
plane with x-y coordinates is a good model for a computer screen: and a 
graphical object is just a collection of points. 


Remark. There are two types of graphical objects: bitmap objects, where 
every pixel of an object is described, and vector object, where we describe 
only critical points, and graphic engine connects them to reconstruct the 
object. A (digital) photo is a good example of a bitmap object: every pixel 
of it is described. Bitmap object can contain a lot of points, so manipulations 
with bitmaps require a lot of computing power. Anybody who has edited 
digital photos in a bitmap manipulation program, like Adobe Photoshop, 
knows that one needs quite a powerful computer, and even with modern 
and powerful computers manipulations can take some time. 


That is the reason that most of the objects, appearing on a computer 
screen are vector ones: the computer only needs to memorize critical points. 
For example, to describe a polygon, one needs only to give the coordinates 
of its vertices, and which vertex is connected with which. Of course, not 
all objects on a computer screen can be represented as polygons, some, like 
letters, have curved smooth boundaries. But there are standard methods 
allowing one to draw smooth curves through a collection of points, for exam- 
ple Bezier splines, used in PostScript and Adobe PDF (and in many other 
formats). 


Anyhow, this is the subject of a completely different book, and we will 
not discuss it here. For us a graphical object will be a collection of points 
(either wireframe model, or bitmap) and we would like to show how one can 
perform some manipulations with such objects. 


The simplest transformation is a translation (shift), where each point 
(vector) v is translated by a, i.e. the vector v is replaced by v + a (nota- 
tion v ++ v +a is used for this). Vector addition is very well adapted to 
computers, so the translation is easy to implement. 


Note, that the translation is not a linear transformation (if a 4 0): while 
it preserves the straight lines, it does not preserve 0. 


All other transformation used in computer graphics are linear. The first 
one that comes to mind is rotation. The rotation by 7 around the origin 0 
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is given by the multiplication by the rotation matrix R, we discussed above, 


R,={ 87 —siny 
Y \ siny cosy /° 


If we want to rotate around a point a, we first need to translate the picture 
by —a, moving the point a to 0, then rotate around O (multiply by R,) and 
then translate everything back by a. 


Another very useful transformation is scaling, given by a matrix 


(0), 


a,b > 0. Ifa = b it is uniform scaling which enlarges (reduces) an object, 
preserving its shape. If a # 6 then x and y coordinates scale differently; the 
object becomes “taller” or “wider”. 


Another often used transformation is reflection: for example the matrix 


1 0 
0 -1 )}’ 
defines the reflection through z-axis. 


We will show later in the book, that any linear transformation in R? can 
be represented either as a composition of scaling rotations and reflections. 
However it is sometimes convenient to consider some different transforma- 
tions, like the shear transformation, given by the matrix 


1 tangy 
0 1 : 


This transformation makes all objects slanted, the horizontal lines remain 
horizontal, but vertical lines go to the slanted lines at the angle y to the 
horizontal ones. 


8.2. 3-dimensional graphics. Three-dimensional graphics is more com- 
plicated. First we need to be able to manipulate 3-dimensional objects, and 
then we need to represent it on 2-dimensional plane (monitor). 


The manipulations with 3-dimensional objects is pretty straightforward, 
we have the same basic transformations: translation, reflection through a 
plane, scaling, rotation. Matrices of these transformations are very similar 
to the matrices of their 2 x 2 counterparts. For example the matrices 


1 0 O a 0 0 cosy —siny 0 
0 1 O , 0b 0], siny cosy 0 
0 0 -1 0 0 ¢ 0 0 1 


represent respectively reflection through x-y plane, scaling, and rotation 
around z-axis. 
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Note, that the above rotation is essentially 2-dimensional transforma- 
tion, it does not change z coordinate. Similarly, one can write matrices for 
the other 2 elementary rotations around x and around y axes. It will be 
shown later that a rotation around an arbitrary axis can be represented as 
a composition of elementary rotations. 


So, we know how to manipulate 3-dimensional objects. Let us now 
discuss how to represent such objects on a 2-dimensional plane. The simplest 
way is to project it to a plane, say to the xz-y plane. To perform such 
projection one just needs to replace z coordinate by 0, the matrix of this 
projection is 


1 0 0 
0 1 0 
0 0 0 


Such method is often used in technical illustrations. Rotating an object 
and projecting it is equivalent to looking at it from different points. However, 
this method does not give a very realistic picture, because it does not take 
into account the perspective, the fact that the objects that are further away 
look smaller. 


To get a more realistic picture one needs to use the so-called perspective 
projection. To define a perspective projection one needs to pick a point (the 
center of projection or the focal point) and a plane to project onto. Then 
each point in R° is projected into a point on the plane such that the point, 
its image and the center of the projection lie on the same line, see Fig. 2. 


This is exactly how a camera works, and it is a reasonable first approx- 
imation of how our eyes work. 


Let us get a formula for the projection. Assume that the focal point is 
(0,0,d)" and that we are projecting onto x-y plane, see Fig. 3 a). Consider 
a point v = (x,y, z)", and let v* = (a*, y*,0)" be its projection. Analyzing 
similar triangles see Fig. 3 b), we get that 


sO 


and similarly 


y ~ To2/d’ 


Note, that this formula also works if z > d and if z < 0: you can draw the 
corresponding similar triangles to check it. 
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Figure 2. Perspective projection onto x-y plane: F is the center (focal 
point) of the projection 


a) L b) 


Figure 3. Finding coordinates x*, y* of the perspective projection of 
the point (x,y, z)". 


(0, 0, d) 


Thus the perspective projection maps a point (x,y,z)" to the point 


(7 a 0) 
1l—z/d? 1—z/d? 

This transformation is definitely not linear (because of z in the denomi- 
nator). However it is still possible to represent it as a linear transformation. 
To do this let us introduce the so-called homogeneous coordinates. 
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In the homogeneous coordinates, every point in R® is represented by 4 
coordinates, the last, 4th coordinate playing role of the scaling coefficient. 
Thus, to get usual 3-dimensional coordinates of the vector v = (x,y,2)" 
from its homogeneous coordinates (21, %2,273,24)" one needs to divide all 
entries by the last coordinate x4 and take the first 3 coordinates - (if v4 = 0 
this recipe does not work, so we assume that the case x4 = 0 corresponds 
to the point at infinity). 


Thus in homogeneous coordinates the vector v* can be represented as 
(x, y,0,1—z/d)", so in homogeneous coordinates the perspective projection 
is a linear transformation: 


x 1 0 0 0 x 
y _{ 01 0 0 y 
0 ~ | 0 0 0 0 z 
1—2z/d 0 0 -1/d 1 1 


Note that in the homogeneous coordinates the translation is also a linear 
transformation: 


e+ ay 100 aq x 
yt+ag _ 0 1 0 ag y 
z+az3 i 0 0 1 ag z 

1 000 1 1 


But what happen if the center of projection is not a point (0,0,d)7 but 
some arbitrary point (d1,d2,d3)’. Then we first need to apply the transla- 
tion by —(d1,d2,0)" to move the center to (0,0,d3)’ while preserving the 
x-y plane, apply the projection, and then move everything back translating 
it by (di, d2,0)". Similarly, if the plane we project to is not 2-y plane, we 
move it to the z-y plane by using rotations and translations, and so on. 

All these operations are just multiplications by 4 x 4 matrices. That 
explains why modern graphic cards have 4 x 4 matrix operations embedded 
in the processor. 


Of course, here we only touched the mathematics behind 3-dimensional 
graphics, there is much more. For example, how to determine which parts of 
the object are visible and which are hidden, how to make realistic lighting, 
shades, etc. 


3If we multiply homogeneous coordinates of a point in R? by a non-zero scalar, we do not 
change the point. In other words, in homogeneous coordinates a point in R® is represented by a 
line through 0 in R¢. 
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Exercises. 
8.1. What vector in R® has homogeneous coordinates (10, 20, 30,5)? 


8.2. Show that a rotation through 7 can be represented as a composition of two 
shear-and-scale transformations 


a 1 0 _ { secy —tany 
ee ny Te=( 0 1 ). 


In what order the transformations should be taken? 
8.3. Multiplication of a 2-vector by an arbitrary 2 x 2 matrix usually requires 4 
multiplications. 


Suppose a 2 x 1000 matrix D contains coordinates of 1000 points in R?. How 
many multiplications are required to transform these points using 2 arbitrary 2 x 2 
matrices A and B. Compare 2 possibilities, A(BD) and (AB)D. 


8.4. Write 4 x 4 matrix performing perspective projection to x-y plane with center 
(di, d2,d3)". 

8.5. A transformation T in R® is a rotation about the line y = x+3 in the x-y plane 
through an angle y. Write a 4 x 4 matrix corresponding to this transformation. 


You can leave the result as a product of matrices. 


Chapter 2 


Systems of linear 
equations 


1. Different faces of linear systems. 


There exist several points of view on what a system of linear equations, or in 
short a linear system is. The first, naive one is, that it is simply a collection 


of m linear equations with n unknowns 71, %2,...,2n, 
ay1%, + A2% + ... + Anim = by 
a21%, + a22%2 Sted aQntn = bo 
Berean + am2%2 he amntn = bm 
To solve the system is to find all n-tuples of numbers 71, %2,..., 2%, which 


satisfy all m equations simultaneously. 
If we denote x := (#1,%2,...,¢n)" € F", b = (b1,b2,...,bm)? € F”, 


and 
a11 a1,2 ain 
a21 42,2 a2.n 
A= . 
Gm,1 Am,2 +--+ Amn 


then the above linear system can be written in the matrix form (as a matriz- 
vector equation) 


Ax =b. 


To solve the above equation is to find all vectors x € F” satisfying Ax = b. 
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And finally, recalling the “column by coordinate” rule of the matrix- 
vector multiplication, we can write the system as a vector equation 


way + vag t+... + %napn =b, 


where a, is the kth column of the matrix A, ay = (a1,4,42,4,--+;@m,k)/ 
Kes Qh: 

Note, these three examples are essentially just different representations 
of the same mathematical object. 


Before explaining how to solve a linear system, let us notice that it does 
not matter what we call the unknowns, xx, y, or something else. So, all 
the information necessary to solve the system is contained in the matrix A, 
which is called the coefficient matrix of the system and in the vector (right 
side) b. Hence, all the information we need is contained in the following 
matrix 


a1 G12 --. Ain | by 
a21 G22 ... Gan | bg 
Gm Am2 ++» Amn | bm 


which is obtained by attaching the column b to the matrix A. This matrix is 
called the augmented matriz of the system. We will usually put the vertical 
line separating A and b to distinguish between the augmented matrix and 
the coefficient matrix. 


2. Solution of a linear system. Echelon and reduced echelon 
forms 


Linear system are solved by the Gauss—Jordan elimination (which is some- 
times called row reduction). By performing operations on rows of the aug- 
mented matrix of the system (i.e. on the equations), we reduce it to a simple 
form, the so-called echelon form. When the system is in the echelon form, 
one can easily write the solution. 


2.1. Row operations. There are three types of row operations we use: 


1. Row exchange: interchange two rows of the matrix; 
2. Scaling: multiply a row by a non-zero scalar a; 
3. Row replacement: replace a row # & by its sum with a constant 


multiple of a row # J; all other rows remain intact; 


It is clear that the operations 1 and 2 do not change the set of solutions 
of the system; they essentially do not change the system. 
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As for the operation 3, one can easily see that it does not lose solutions. 
Namely, let a “new” system be obtained from an “old” one by a row oper- 
ation of type 3. Then any solution of the “old” system is a solution of the 
“new” one. 


To see that we do not gain anything extra, i.e. that any solution of the 

“new” system is also a solution of the “old” one, we just notice that row 
operation of type 3 are reversible, i.e. the “old’ system also can be obtained 
from the “new” one by applying a row operation of type 3 (can you say 
which one?) 
2.1.1. Row operations and multiplication by elementary matrices. There is 
another, more “advanced” explanation why the above row operations are 
legal. Namely, every row operation is equivalent to the multiplication of the 
matrix from the left by one of the special elementary matrices. 


Namely, the multiplication by the matrix 


0 


just interchanges the rows number j and number k. Multiplication by the 
matrix 


A way to describe 
(or to remember) 
these elementary 
matrices: they are 
obtained from I by 
applying the 
corresponding row 
operation to it 
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multiplies the row number k by a. Finally, multiplication by the matrix 


0 


1 


1 


adds to the row #k row #7 multiplied by a, and leaves all other rows intact. 


To see, that the multiplication by these matrices works as advertised, 
one can just see how the multiplications act on vectors (columns). 


Note that all these matrices are invertible (compare with reversibility of 
row operations). The inverse of the first matrix is the matrix itself. To get 
the inverse of the second one, one just replaces a by 1/a. And finally, the 
inverse of the third matrix is obtained by replacing a by —a. To see that 
the inverses are indeed obtained this way, one again can simply check how 
they act on columns. 

So, performing a row operation on the augmented matrix of the system 
Ax = b is equivalent to the multiplication of the system (from the left) by 
a special invertible matrix E. Left multiplying the equality Ax = b by E 
we get that any solution of the equation 


Ax=b 
is also a solution of 
EAx = Eb. 


Multiplying this equation (from the left) by E~! we get that any of its 
solutions is a solution of the equation 


E~'EAx = E~'Eb, 


which is the original equation Ax = b. So, a row operation does not change 
the solution set of a system. 


2.2. Row reduction. The main step of row reduction consists of three 
sub-steps: 


1. Find the leftmost non-zero column of the matrix; 


2. Make sure, by applying row operations of type 1 (row exchange), if 
necessary, that the first (the upper) entry of this column is non-zero. 
This entry will be called the pivot entry or simply the pivot; 
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3. “Kill” (ie. make them 0) all non-zero entries below the pivot by 
adding (subtracting) an appropriate multiple of the first row from 
the rows number 2,3,...,m. 


We apply the main step to a matrix, then we leave the first row alone and 
apply the main step to rows 2,...,m, then to rows 3,...,m, etc. 


The point to remember is that after we subtract a multiple of a row from 
all rows below it (step 3), we leave it alone and do not change it in any way, 
not even interchange it with another row. 

After applying the main step finitely many times (at most m), we get 
what is called the echelon form of the matrix. 

2.2.1. An example of row reduction. Let us consider the following linear 
system: 

2, + 272+ 3273 = 1 

32, + 2%2 +273 =7 

221 + x2 Ar 223 = 
The augmented matrix of the system is 


1 2 3/1 
3.2 1/7 
2 1 2)1 


Subtracting 3-Row#1 from the second row, and subtracting 2.-Row#1 from 
the third one we get: 


1 2 3/1 1 2 3 1 
3.2 1/7 —3R, ~ 0 -4 -8 
2 1 2/1 —2R, 0 -3 -4]-1 
Multiplying the second row by —1/4 we get 
1 2 3 1 
0 1 2|—-1 
0 -3 -4] -1 
Adding 3-Row#2 to the third row we obtain 
1 2 3 1 1 2 3 1 
0 1 2|-1 ~ 0 1 2)-1 


0 -3 —4)-1 /+4+8Re2 


Now we can use the so called back substitution to solve the system. Namely, 
from the last row (equation) we get x3 = —2. Then from the second equation 
we get 


o 

co) 

wo 
| 


4 


eo 1 = Dag ea O(- 9) Sg, 
and finally, from the first row (equation) 


r=1 22x 323 = 1 64+6 1. 


Pivots: leading 
(leftmost non-zero 
entries) in a row. 
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So, the solution is 


BO 1 
xr = 3, 
23 = —2, 
or in vector form 
1 
x= 3 
—2 


or x = (1,3, —2)7. We can check the solution by multiplying Ax, where A 
is the coefficient matrix. 


Instead of using back substitution, we can do row reduction from bottom 
to top, killing all the entries above the main diagonal of the coefficient 
matrix: we start by multiplying the last row by 1/2, and the rest is pretty 
self-explanatory: 


{3° 3) a VesR; PB) ASRS 
Oe PSE. [aoRe ee |) Oar || <8 
00 1|-2 Oat: a|||=—9 
10° @|) 4 
~{0 1 0] 3 
60> a2 


and we just read the solution x = (1,3, —2)" off the augmented matrix. 


We leave it as an exercise to the reader to formulate the algorithm for 
the backward phase of the row reduction. 


2.3. Echelon form. A matrix is in echelon form if it satisfies the following 
two conditions: 


1. All zero rows (i.e. the rows with all entries equal 0), if any, are below 
all non-zero entries. 


For a non-zero row, let us call the leftmost non-zero entry the leading entry. 
Then the second property of the echelon form can be formulated as follows: 


2. For any non-zero row its leading entry is strictly to the right of the 
leading entry in the previous row. 


The leading entry in each row in echelon form is also called pivot entry, 
or simply pivot, because these entries are exactly the pivots we used in the 
row reduction. 


A particular case of the echelon form is the so-called triangular form. 


We got this form in our example above. In this form the coefficient matrix is 
square (n x 7), all its entries on the main diagonal are non-zero, and all the 
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entries below the main diagonal are zero. The right side, i.e. the rightmost 
column of the augmented matrix can be arbitrary. 


After the backward phase of the row reduction, we get what the so- 
called reduced echelon form of the matrix: coefficient matrix equal J, as in 
the above example, is a particular case of the reduced echelon form. 


The general definition is as follows: we say that a matrix is in the reduced 
echelon form, if it is in the echelon form and 


3. All pivot entries are equal 1; 


4. All entries above the pivots are 0. Note, that all entries below the 
pivots are also 0 because of the echelon form. 


To get reduced echelon form from echelon form, we work from the bottom 
to the top and from the right to the left, using row replacement to kill all 
entries above the pivots. 


An example of the reduced echelon form is the system with the coefficient 
matrix equal J. In this case, one just reads the solution from the reduced 
echelon form. In general case, one can also easily read the solution from 
the reduced echelon form. For example, let the reduced echelon form of the 
system (augmented matrix) be 


1} 2 0 0 0)f1 
0 O |1) 5 0 }2 |; 
0 0 0 O {1}; 3 


here we boxed the pivots. The idea is to move the variables, corresponding 
to the columns without pivot (the so-called free variables) to the right side. 
Then we can just write the solution. 


t= 1- 2x2 


Zq is free 
43 = 2-524 
z4 is free 
G5 >= 3 
or in the vector form 
1— 229 1 —2 0 
wD) 0 1 0 
x= 2— 524 = 2 + x2 0 + x4 —5 » «2,04EF. 
4 0 0 1 
3 3 0 0 


One can also find the solution from the echelon form by using back sub- 
stitution: the idea is to work from bottom to top, moving all free variables 
to the right side. 
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Exercises. 
2.1. Write the systems of equations below in matrix form and as vector equations: 
ey a 2x9 = w30°= -1 
a) 221 ae 222 re t30°>= 1 
321 ar. 5x2 = 2x3 = -l 
ry 2x9 w3 = 1 
221 322 oy 6 
) 32, — 522 af 
ry r 523 = 9 
Uy + 22 SF 2x4 = 6 
c) 321 ales 5x9 a x3 Tr 6x4 = 17 
221 ms Aro i 3° 224 = 12 
221 = 7x3 +r lla, = 7 
Ly = Ax. = w3 0° w4 = 3 
d 221 = 8x2 + 0 6 4x4 = 9 
—%1 + Axo = 223 Ie 5x4 = -6 
TT oT 22x — «£3 r 3x4 — ew 
e 24, + 4%. - 23 + 644 = 5 
v2 iz 224 =~ 3 
221 ca 222 aul w3 0° 6x4 —245 = 
f 4 i i x2 r w30°OT 204 —-% = 
Ary << 4x9 me 523 7x4 —-“@£Rh = 
321 v2 x3 v4 2245, = 5 
Uy v2 X3 204 mi = 2 
8 521 222 X3 3x4 325, = 10 
221 —_ v2 224 Lh = 5 


Solve the systems. Write the answers in the vector form. 


2.2. Find all solutions of the vector equation 


L1V1 + LoVe + 73V3 = 0, 


where v; = (1,1,0)", v2 = (0,1,1)7 and v3 = (1,0,1)?. What conclusion can you 
make about linear independence (dependence) of the system of vectors v1, V2, V3? 


3. Analyzing the pivots. 


All questions about existence of a solution and it uniqueness can be answered 
by analyzing pivots in the echelon (reduced echelon) form of the augmented 
matrix of the system. First of all, let us investigate the question of when 
is the equation Ax = b inconsistent, i.e. when it does not have a solution. 
The answer follows immediately, if one just thinks about it: 


3. Analyzing the pivots. AT 


a system is inconsistent (does not have a solution) if and only 
if there is a pivot in the last column of an echelon form of the 
augmented matrix, i.e. iff an echelon form of the augmented matrix 


has a row ( 0 O: a5: 0|b), 640 init. 


Indeed, such a row correspond to the equation 021 + 0x2 +...+0%n =b 40 
that does not have a solution. If we don’t have such a row, we just make 
the reduced echelon form and then read the solution off it. 


Now, three more statements. Note, they all deal with the coefficient 
matrix, and not with the augmented matrix of the system. 


1. A solution (if it exists) is unique iff there are no free variables, that 
is if and only if the echelon form of the coefficient matrix has a pivot 
in every column; 

2. Equation Ax = b is consistent for all right sides b if and only if the 
echelon form of the coefficient matrix has a pivot in every row. 


3. Equation Ax = b has a unique solution for any right side b if and 
only if echelon form of the coefficient matrix A has a pivot in every 
column and every row. 


The first statement is trivial, because free variables are responsible for 
all non-uniqueness. I should only emphasize that this statement does not 
say anything about the existence. 


The second statement is a tiny bit more complicated. If we have a pivot 
in every row of the coefficient matrix, we cannot have the pivot in the last 
column of the augmented matrix, so the system is always consistent, no 
matter what the right side b is. 

Let us show that if we have a zero row in the echelon form of the coeffi- 
cient matrix A, then we can pick a right side b such that the system Ax = b 
is not consistent. Let A, echelon form of the coefficient matrix A. Then 

A, = EA, 
where E is the product of elementary matrices, corresponding to the row 
operations, F = Ey... E2F\. If A, has a zero row, then the last row is also 
zero. Therefore, if we put b. = (0,...,0,1)" (all entries are 0, except the 
last one), then the equation 

A-x = be 
does not have a solution. Multiplying this equation by E~! from the left, 
and recalling that E~!A, = A, we get that the equation 

Ax = E~'b, 


does not have a solution. 


Finally, statement 3 immediately follows from statements 1 and 2. 
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From the above analysis of pivots we get several very important corol- 
laries. The main observation we use is 


In echelon form, any row and any column have no more than 1 
pivot in it (it can have 0 pivots) 


3.1. Corollaries about linear independence and bases. Dimension. 
Questions as to when a system of vectors in F” is a basis, a linearly inde- 
pendent or a spanning system, can be easily answered by the row reduction. 


Proposition 3.1. Let us have a system of vectors V1, V2,..-;Vm € F”", and 
let A = [V1,V2,..-;Vm] be ann xX m matrix with columns v1,V2,..-,Vm- 
Then 

1. The system v1, V2,..-,Vm is linearly independent iff echelon form of 


A has a pivot in every column; 


2. The system v1, V2,...,;Vm 1s complete in F” (spanning, generating) 
iff echelon form of A has a pivot in every row; 


3. The system V1, V2,..-,Vm is a basis in F” iff echelon form of A has 
a pivot in every column and in every row. 


Proof. The system v1, v2,...,Vm € F” is linearly independent if and only 
if the equation 

U1V, + vovea+...+%mVm = 0 
has the unique (trivial) solution x1 = 7g = ... = &m = 0, or equivalently, 


the equation Ax = 0 has unique solution x = 0. By statement 1 above, it 
happens if and only if there is a pivot in every column of the matrix. 

Similarly, the system v1,V2,...,Vm € F” is complete in F” if and only 
if the equation 


T1V1, + Gove +...+L%mVm = b 

has a solution for any right side b € F”. By statement 2 above, it happens 
if and only if there is a pivot in every row in echelon form of the matrix. 
And finally, the system v1, v2,...,Vm € F” is a basis in F” if and only 
if the equation 


T1Vi, + Gove +...+2%mVm =b 


has unique solution for any right side b € F”. By statement 3 this happens 
if and only if there is a pivot in every column and in every row of echelon 


form of A. 


Proposition 3.2. Any linearly independent system of vectors in F” cannot 
have more than n vectors in it. 
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Proof. Let a system v1, V2,...,Vm € F” be linearly independent, and let 
A = [v1,Vv2,..-,Vm] be the nm x m matrix with columns vi, v2,...,Vm. By 
Proposition 3.1 echelon form of A must have a pivot in every column, which 
is impossible if m > n (number of pivots cannot be more than number of 
rows). 


Proposition 3.3. Any two bases in a vector space V have the same number 
of vectors in them. 


Proof. Let vj, v2,...,Vn and Wy, W2,...,Wm be two different bases in V. 
Without loss of generality we can assume that n < m. Consider an isomor- 
phism A: F” > V defined by 
Aer, = Vp, Ke es dey 
where e1,€9,...,€, is the standard basis in R”. 
Since A~! is also an isomorphism, the system 


Aqtw,, A7two,..., 47! wm 


is a basis (see Theorem 6.6 in Chapter 1). So it is linearly independent, 
and by Proposition 3.2, m <n. Together with the assumption n < m this 
implies that m = n. 


The statement below is a particular case of the above proposition. 


Proposition 3.4. Any basis in F” must have exactly n vectors in it. 


Proof. This fact follows immediately from the previous proposition, but 


there is also a direct proof. Let v1, Vv2,...,Vm be a basis in F” and let A be 
the n x m matrix with columns vj, V2,...,Vm. The fact that the system is 
a basis, means that the equation 

Ax=b 


has a unique solution for any (all possible) right side b. The existence means 
that there is a pivot in every row (of a reduced echelon form of the matrix), 
hence the number of pivots is exactly n. The uniqueness mean that there is 
pivot in every column of the coefficient matrix (its echelon form), so 


m = number of columns = number of pivots = n 


Proposition 3.5. Any spanning (generating) set in F" must have at least 
nm vectors. 


Proof. Let v1, v2,...,V¥m be a complete system in F”, and let A be n x m 
matrix with columns vj, V2,...,Vm. Statement 2 of Proposition 3.1 implies 
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that echelon form of A has a pivot in every row. Since number of pivots 
cannot exceed the number of columns, n < m. 


3.2. Corollaries about invertible matrices. 


Proposition 3.6. A matrix A is invertible if and only if its echelon form 
has pivot in every column and every row. 


Proof. As it was discussed in the beginning of the section, the equation 
Ax = b has a unique solution for any right side b if and only if the echelon 
form of A has pivot in every row and every column. But, we know, see 
Theorem 6.8 in Chapter 1, that the matrix (linear transformation) A is 
invertible if and only if the equation Ax = b has a unique solution for any 
possible right side b. 


There is also an alternative proof. We know that a matrix is invertible 
if and only if its columns form a basis in (see Corollary 6.9 in Section 6.4, 
Chapter 1). Proposition 3.4 above states that it happens if and only if there 
is a pivot in every row and every column. 


The above proposition immediately implies the following 
Corollary 3.7. An invertible matriz must be square (n x n). 


Proposition 3.8. If a square (nxn) matriz is left invertible, or if it is right 
invertible, then it is invertible. In other words, to check the invertibility of a 
square matrix A it is sufficient to check only one of the conditions AA~! = I, 
ATA=I. 


Note, that this proposition applies only to square matrices! 


Proof. We know that matrix A is invertible if and only if the equation 
Ax = b has a unique solution for any right side b. This happens if and only 
if the echelon form of the matrix A has pivots in every row and in every 
column. 


If a matrix A is left invertible, the equation Ax = 0 has unique solution 
x = 0. Indeed, if B is a left inverse of A (i.e. BA =I), and x satisfies 


Ax = 0, 


then multiplying this identity by B from the left we get x = 0, so the 
solution is unique. Therefore, the echelon form of A has pivots in every 
column (no free variables). If the matrix A is square (n x n), the echelon 
form also has pivots in every row (n pivots, and a row can have no more 
than 1 pivot), so the matrix is invertible. 
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If a matrix A is right invertible, and C is its right inverse (AC = 1), 
then for x = Cb, b € F” 
Ax = ACb = Ib=b. 


Therefore, for any right side b the equation Ax = b has a solution x = Cb. 
Thus, echelon form of A has a pivot in every row. If A is square, it also has 
a pivot in every column, so A is invertible. 


Exercises. 


3.1. For what value of b the system 


TY 22.2.2 1 
24 6 |]x={ 4 
1 2 3 b 


has a solution. Find the general solution of the system for this value of b. 


3.2. Determine, if the vectors 


1 1 0 0 
1 0 0 1 
0 |’ be hy a le 0 
0 0 1 1 


are linearly independent or not. 
Do these four vectors span R*? (In other words, is it a generating system?) 

What about C+? 
3.3. Determine, which of the following systems of vectors are bases in R?: 

a) (1,2,-1)", (1,0,2)", (2,1,1)7; 

b) (-1,3,2)7, (—3,1,3)", (2,10, 2)7; 

c) (67,13, —47)7, (7, —7.84,0)7, (3,0,0)*. 
Which of the systems are bases in C?? 


3.4. Do the polynomials x? + 27, x? +2 +1, 23 +5 generate (span) P3? Justify 
your answer. 


3.5. Can 5 vectors in F4 be linearly independent? Justify your answer. 


3.6. Prove or disprove: If the columns of a square (n x n) matrix A are linearly 
independent, so are the columns of A? = AA. 


3.7. Prove or disprove: If the columns of a square (n x n) matrix A are linearly 
independent, so are the rows of A? = AAA. 


3.8. Show that if the equation Ax = 0 has unique solution (i.e. if echelon form of 
A has pivot in every column), then A is left invertible. Hint: elementary matrices 
may help. 

Note: It was shown in the text that if A is left invertible, then the equation Ax = 0 
has unique solution. But here you are asked to prove the converse of this statement, 
which was not proved. 
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Remark: This can be a very hard problem, for it requires deep understanding 
of the subject. However, when you understand what to do, the problem becomes 
almost trivial. 


3.9. Is the reduced echelon form of a matrix unique? Justify your conclusion. 
Namely, suppose that by performing some row operations (not necessarily fol- 
lowing any algorithm) we end up with a reduced echelon matrix. Do we always 
end up with the same matrix, or can we get different ones? Note that we are only 
allowed to perform row operations, the “column operations”’ are forbidden. 
Hint: What happens if you start with an invertible matrix? Also, are the pivots 
always in the same columns, or can it be different depending on what row operations 
you perform? If you can tell what the pivot columns are without reverting to row 
operations, then the positions of pivot columns do not depend on them. 


4. Finding A7! by row reduction. 


As it was discussed above, an invertible matrix must be square, and its eche- 
lon form must have pivots in every row and every column. Therefore reduced 
echelon form of an invertible matrix is the identity matrix J. Therefore, 


Any invertible matrix is row equivalent (i.e. can be reduced by row 
operations) to the identity matrix. 


Now let us state a simple algorithm of finding the inverse of an n x n 
matrix: 


1. Form an augmented nx 2n matrix (A | I) by writing the nxn identity 
matrix right of A; 


2. Performing row operations on the augmented matrix transform A to 
the identity matrix I; 

3. The matrix J that we added will be automatically transformed to 
Al, 

4. If it is impossible to transform A to the identity by row operations, 
A is not invertible. 


There are several possible explanations of the above algorithm. The 
first, a naive one, is as follows: we know that (for an invertible A) the vector 
A~'b is the solution of the equation Ax = b. So to find the column number 
k of A~! we need to find the solution of Ax = ex, where e1,e2,..., en is the 
standard basis in R”. The above algorithm just solves the equations 


Ax = ex, KH 12s Sen 
simultaneously! 


Let us also present another, more “advanced” explanation. As we dis- 
cussed above, every row operation can be realized as a left multiplication 
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by an elementary matrix. Let FE, Fe,..., Hy be the elementary matrices 
corresponding to the row operation we performed, and let EF = Ey --: EF, 
be their product.! We know that the row operations transform A to iden- 
tity, ie. HA =I, so E = A~!. But the same row operations transform the 
augmented matrix (A|I) to (EA|E) = (I| A7?). 

This “advanced” explanation using elementary matrices implies an im- 
portant proposition that will be often used later. 


Theorem 4.1. Any invertible matrix can be represented as a product of 
elementary matrices. 


Proof. As we discussed in the previous paragraph, A~! = Ey --- E2E), so 


A=(A1)1=8;'B,'..- Ey’. 


An Example. Suppose we want to find the inverse of the matrix 


1 4 -2 
=2. 5% Ff 
3 11 -6 


Augmenting the identity matrix to it and performing row reduction we get 


1 4 -2}1 0 0 1 4 -2 10 0 


—2 -7 7/0 1 0 J4+2R; ~]|] O 1 3) 2 1 °0 ~ 
3 11 -6)0 0 1/-3R, 0-1 O|-3 0 1/+R2 
1 4 -2} 10 0)\x3 3.12 -6; 3 0 0 \+2R3 
O01 3) 2 1 0 ~10 1 38] 2 1 0 )-R3 ~ 
00 3)-1 1 1 0 O 38;/-1 11 


Here in the last row operation we multiplied the first row by 3 to avoid 
fractions in the backward phase of row reduction. Continuing with the row 
reduction we get 


3°12 0; 12 2 \—12Re 3.0 0)-35 2 14 
0 1 0; 38 0 -1 ~{ 0 1 0 3.0 -1 
0 0 3;-1 1 1 0 0 3] -1 1 1 


Dividing the first and the last row by 3 we get the inverse matrix 
—35/3 2/3 14/3 
3 0 -l 
-1/3 1/3 1/3 


lAithough it does not matter here, but please notice, that if the row operation Ey was 
performed first, EZ; must be the rightmost term in the product 
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Exercises. 


4.1. Find the inverse of the matrices 


121 1-1 2 
3°73), 1 -2 
23 4 1 1 4 


Show all steps 


5. Dimension. Finite-dimensional spaces. 


Definition. The dimension dimV of a vector space V is the number of 
vectors in a basis. 

For a vector space consisting only of zero vector 0 we put dim V = 0. If 
V does not have a (finite) basis, we put dim V = oo. 


If dim V is finite, we call the space V finite-dimensional; otherwise we 
call it infinite-dimensional. 


Proposition 3.3 asserts that the dimension is well defined, i.e. that it 
does not depend on the choice of a basis. 


Proposition 2.8 from Chapter 1 states that any finite spanning system 
in a vector space V contains a basis. This immediately implies the following 


Proposition 5.1. A vector space V is finite-dimensional if and only if it 
has a finite spanning system. 


Suppose, that we have a system of vectors in a finite-dimensional vector 
space, and we want to check if it is a basis (or if it is linearly independent, 
or if it is complete)? Probably the simplest way is to use an isomorphism 
A:V —>R", n= dim E to move the problem to R”, where all such questions 
can be answered by row reduction (studying pivots). 

Note, that if dimV = n, then there always exists an isomorphism A : 
V — R”. Indeed, if dim V = n then there exists a basis v1, V2,...,Vn € V, 
and one can define an isomorphism A: V > R” by 


Avr = ek, he 152. nef 


As an example, let us give the following two corollaries of the above 
Propositions 3.2, 3.5: 


Proposition 5.2. Any linearly independent system in a finite-dimensional 
vector space V cannot have more than dim V vectors in it. 


Proof. Let vi,v2,...,Vm € V be a linearly independent system, and let 
A: V — R” be an isomorphism. Then Avi, Avoa,...,AVm is a linearly 
independent system in R", and by Proposition 3.2 m <n. 
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Proposition 5.3. Any generating system in a finite-dimensional vector 
space V must have at least dim V vectors in it. 


Proof. Let v1, v2,...,Vm € V be a complete system in V, and let A: V > 
R” be an isomorphism. Then Avi, Avg2,...,AVm is a complete system in 
R”, and by Proposition 3.5 m > n. 


5.1. Completing a linearly independent system to a basis. The fol- 
lowing statement will play an important role later. 


Proposition 5.4 (Completion to a basis). A linearly independent system 
of vectors in a finite-dimensional space can be completed to a basis, i.e. if 
V1, V2,-.-.,V, are linearly independent vectors in a finite-dimensional vector 
space V then one can find vectors Vr+1,Vr+2---,Wn such that the system of 
vectors V1,V2,-.-;Vn is a basis in V. 


Proof. Let n = dim V.Take any vector not belonging to span{v1, v2,..., vr} 
and call it v;41 (one can always do that because the system vj, v2,..., Vr is 
not generating). By Exercise 2.5 from Chapter | the system v1,..., Vr, Vr+1 
is linearly independent (notice that in this case r < n by Proposition 5.2). 
Repeat the procedure with the new system to get vector v;+2, and so on. 


We will stop the process when we get a generating system. Note, that 
the process cannot continue infinitely, because a linearly independent system 


of vectors in V cannot have more than n = dim V vectors. 


5.2. Subspaces of finite dimensional spaces. 


Theorem 5.5. Let V be a subspace of a vector space W, dimW < oo. Then 
V is finite dimensional and dimV < dimW. 

Moreover, if dimV = dimW then V = W (we are still assuming that V 
is a subspace of W here). 


Remark. This theorem looks like a complete banality, like an easy corollary 
of Proposition 5.2. But we can apply Proposition 5.2 only if we already have 
a basis in V. And we only have a basis in W, and we cannot say how many 
vectors in this basis belong to V; in fact, it is easy to construct an example 
where none of the vectors in the basis of W belongs to V. 


Proof of Theorem 5.5. If V = {0} then the theorem is trivial, so let us 
assume otherwise. 


We want to find a basis in V. Take a non-zero vector v; € V. If 
V = span{v1}, we got our basis (consisting of the one vector vj). 

If not, we continue by induction. Suppose we constructed r linearly 
independent vectors v1,...,v, € V. If V = span{v, : 1 < k < r}, 
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then we have found a basis in V. If not, there exists a vector v;+1 € V, 
vr41 ¢ span{v, :1<k <r}. By Exercise 2.5 from Chapter 1 the system 
V1,---,Vr,Vr+41 Is linearly independent. 


Exercises. 
5.1. True or false: 


a) Every vector space that is generated by a finite set has a basis; 


b) Every vector space has a (finite) basis; 


io) 


A vector space cannot have more than one basis; 


If a vector space has a finite basis, then the number of vectors in every 
basis is the same. 


e) The dimension of P,, is n; 
f) The dimension on Mj, x7 is m+n; 


g) If vectors vi, V2,...,Vn generate (span) the vector space V, then every 
vector in V can be written as a linear combination of vector v1, V2,..-,Vn 
in only one way; 


h) Every subspace of a finite-dimensional space is finite-dimensional; 


i) If V is a vector space having dimension n, then V has exactly one subspace 
of dimension 0 and exactly one subspace of dimension n. 


5.2. Prove that if V is a vector space having dimension n, then a system of vectors 
V1,V2,---,Vn in V is linearly independent if and only if it spans V. 


5.3. Prove that a linearly independent system of vectors v1, V2,...,Vn in a vector 
space V is a basis if and only ifn = dimV. 


5.4. (An old problem revisited: now this problem should be easy) Is it possible that 
vectors V1, V2, V3 are linearly dependent, but the vectors w; = v1+V2, W2 = v2+v3 
and w3 = v3 + vy are linearly independent? Hint: What dimension can the 
subspace span(v1, V2, V3) have? 


5.5. Let vectors u,v,w be a basis in V. Show that u+ v+w, v+w, wis alsoa 
basis in V. 

5.6. Consider in the space R® vectors v1 = (2, —1,1,5, —3)", ve = (3, —2, 0, 0,0)", 
v3 = (1,1,50, —921, 0)”. 


a) Prove that these vectors are linearly independent. 


b) Complete this system of vectors to a basis. 
If you do part b) first you can do everything without any computations. 
6. General solution of a linear system. 


In this short section we discuss the structure of the general solution (i.e. of 
the solution set) of a linear system. 
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We call a system Ax = b homogeneous if the right side b = 0, ie. a 
homogeneous system is a system of form Ax = 0. 


With each system 
Ax=b 
we can associate a homogeneous system just by putting b = 0. 
Theorem 6.1 (General solution of a linear equation). Let a vector x; satisfy 


the equation Ax = b, and let H be the set of all solutions of the associated 
homogeneous system 


Ax = 0. 
Then the set 
{x =x, +xn: xn € H} 


is the set of all solutions of the equation Ax = b. 


In other words, this theorem can be stated as 


General solution A particular solu- General solution 
of Ax =b tion of Ax = b of Ax =0 ; 


Proof. Fix a vector x; satisfying Ax; = b. Let a vector xy satisfy Ax, = 0. 
Then for x = x; + Xp we have 


Ax = A(x; + x) = Ax; + Ax, = b+0=b, 


so any x of form 
x=x14+Xh, x, € A 
is a solution of Ax = b. 


Now let x satisfy Ax = b. Then for x, := x — x1 we get 


Axy, = A(x X1) = Ax Ax, =b-b= 0, 


so x, € H. Therefore any solution x of Ax = b can be represented as 
x = xX, + Xp with some x, € H. 


The power of this theorem is in its generality. It applies to all linear 
equations, we do not have to assume here that vector spaces are finite- 
dimensional. You will meet this theorem in differential equations, integral 
equations, partial differential equations, etc. Besides showing the struc- 
ture of the solution set, this theorem allows one to separate investigation 
of uniqueness from the study of existence. Namely, to study uniqueness, 
we only need to analyze uniqueness of the homogeneous equation Ax = 0, 
which always has a solution. 
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There is an immediate application in this course: this theorem allows us 
to check a solution of a system Ax = b. For example, consider a system 


2314 -9 17 
124-1 3). _ | 6 
ft +o 2g "= 1s 
9223 -8 14 


Performing row reduction one can find the solution of this system 


3 —2 2 
1 1 =I] 
(6.1) x= 0 + 73 1 + 25 0 5, t3,05 EF. 
2 0 2 
0 0 1 


The parameters x3, 5 can be denoted here by any other letters, t and s, 
for example; we are keeping notation x3 and 2; here only to remind us that 
the parameters came from the corresponding free variables. 


Now, let us suppose, that we are just given this solution, and we want 
to check whether or not it is correct. Of course, we can repeat the row 
operations, but this is too time consuming. Moreover, if the solution was 
obtained by some non-standard method, it can look differently from what 
we get from the row reduction. For example, the formula 


(6.2) x= ; s,teF 


ON OF WwW 
+ 
DW 
+ 
oH 
PnNor oO 


gives the same set as (6.1) (can you say why?); here we just replaced the last 
vector in (6.1) by its sum with the second one. So, this formula is different 
from the solution we got from the row reduction, but it is nevertheless 
correct. 


The simplest way to check that (6.1) and (6.2) give us correct solutions, 
is to check that the first vector (3, 1,0,2,0)7 satisfies the equation Ax = b, 
and that the other two (the ones with the parameters x3 and x5 or s and t in 
front of them) should satisfy the associated homogeneous equation Ax = 0. 

If this checks out, we will be assured that any vector x defined by (6.1) 
or (6.2) is indeed a solution. 

Note, that this method of checking the solution does not guarantee that 


(6.1) (or (6.2)) gives us all the solutions. For example, if we just somehow 
miss out the term with x3, the above method of checking will still work fine. 
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So, how can we guarantee, that we did not miss any free variable, and 
there should not be extra term in (6.1)? 

What comes to mind, is to count the pivots again. In this example, if 
one does row operations, the number of pivots is 3. So indeed, there should 
be 2 free variables, and it looks like we did not miss anything in (6.1). 

To be able to prove this, we will need new notions of fundamental sub- 
spaces and of rank of a matrix. I should also mention that in this particular 
example, one does not have to perform all row operations to check that there 
are only 2 free variables, and that formulas (6.1) and (6.2) both give correct 
general solutions. 


Exercises. 
6.1. True or false 


Any system of linear equations has at least one solution; 


oT 2 


Any system of linear equations has at most one solution; 


ie) 


Any homogeneous system of linear equations has at least one solution; 


Q 


Any system of n linear equations in n unknowns has at least one solution; 


Any system of n linear equations in n unknowns has at most one solution; 


at 


If the homogeneous system corresponding to a given system of a linear 
equations has a solution, then the given system has a solution; 


If the coefficient matrix of a homogeneous system of n linear equations in 
n unknowns is invertible, then the system has no non-zero solution; 


isje} 


h) The solution set of any system of m equations in n unknowns is a subspace 
in R"; 

i) The solution set of any homogeneous system of m equations in n unknowns 
is a subspace in R”. 


6.2. Find a 2 x 3 system (2 equations with 3 unknowns) such that its general 
solution has a form 


1 1 
1 +s 2 : sER. 
0 1 


7. Fundamental subspaces of a matrix. Rank. 


As we discussed above in Section 7 of Chapter 1, with any linear transfor- 
mation A: V + W we can associate two subspaces, namely, its kernel, or 
null space 


Ker A = Null A:={veEV:Av=0} CY, 


and its range 


Ran A = {w € W: w= Av for some v € V}. Cc W. 
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In other words, the kernel Ker A is the solution set of the homogeneous 
equation Ax = 0, and the range Ran A is exactly the set of all right sides 
b € W for which the equation Ax = b has a solution. 


If A is an m x n matrix, ie. a mapping from F” to F™, then it follows 
from the “column by coordinate” rule of the matrix multiplication that any 
vector w € Ran A can be represented as a linear combination of columns of 
A. This explains the name column space (notation Col A), which is often 
used instead of Ran A. 


If A is a matrix, then in addition to Ran A and Ker A one can also 
consider the range and kernel for the transposed matrix A’. Often the term 
row space is used for Ran A? and the term left null space is used for Ker AT 
(but usually no special notation is introduced). 

The four subspaces Ran A, Ker A, Ran A’, Ker A? are called the funda- 
mental subspaces of the matrix A. In this section we will study important 
relations between the dimensions of the four fundamental subspaces. 


We will need the following definition, which is one of the fundamental 
notions of Linear Algebra 


Definition. Given a linear transformation (matrix) A its rank, rank A, is 
the dimension of the range of A 


rank A := dim Ran A. 


7.1. Computing fundamental subspaces and rank. To compute the 
fundamental subspaces and rank of a matrix, one needs to do echelon re- 
duction. Namely, let A be the matrix, and A, be its echelon form 


1. The pivot columns of the original matrix A (i.e. the columns where 
after row operations we will have pivots in the echelon form) give us 
a basis (one of many possible) in Ran A. 

2. The pivot rows of the echelon from Ae give us a basis in the row 
space. Of course, it is possible just to transpose the matrix, and 
then do row operations. But if we already have the echelon form of 
A, say by computing Ran A, then we get Ran A” for free. 


3. To find a basis in the null space Ker A one needs to solve the homo- 
geneous equation Ax = 0: the details will be seen from the example 
below. 


Example. Consider a matrix 


Pest 32 22s dl 
22° 1 1 1 
3.3 3 3 2 
1 1 -1 -1 0 
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Performing row operations we get the echelon form 


ft, a2) 2 41 
0 0 [-3] -3 -1 
0. Oe GN O00 
O30 “Gi 10-0 


(the pivots are boxed here). So, the columns 1 and 3 of the original matriz, 
i.e. the columns 


1 2 
2 1 
3 ]° 3 
1 -1 


give us a basis in Ran A. We also get a basis for the row space Ran A” for 
free: the first and second row of the echelon form of A, i.e. the vectors 


0 
0 
—3 
—3 
—1 


BEPNNFPH 


(we put the vectors vertically here. The question of whether to put vectors 
here vertically as columns, or horizontally as rows is is really a matter of 
convention. Our reason for putting them vertically is that although we call 
Ran A? the row space we define it as a column space of A‘) 


To compute the basis in the null space Ker A we need to solve the equa- 
tion Ax = 0. Compute the reduced echelon form of A, which in this example 
is 


iI] 1 0 0 1/3 
O° Ore) a 73 
000 0 0 
000 0 0 


Note, that when solving the homogeneous equation Ax = 0, it is not neces- 
sary to write the whole augmented matrix, it is sufficient to work with the 
coefficient matrix. Indeed, in this case the last column of the augmented 
matrix is the column of zeroes, which does not change under row opera- 
tions. So, we can just keep this column in mind, without actually writing 
it. Keeping this last zero column in mind, we can read the solution off the 
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reduced echelon form above: 


ty = —L2Q— 545, 
x2 is free, 
L3 = —L%— $25 
xq is free, 
“5 is free, 
or, in the vector form 
—£2 — 525 -1 0 —1/3 
BoD) 1 0 0 
(7.1) x= t4— 425 | =22] 0 + 04 1 |4+as 1/3 
v4 0 1 0 
5 0 0 1 


The vectors at each free variable, i.e. in our case the vectors 


= 0 178 
1 0 0 
o |, =e 78 
0 1 0 
0 1 


form a basis in Ker A. 


Unfortunately, there is no shortcut for finding a basis in Ker A’, one 
must solve the equation A?x = 0. The knowledge of the echelon form of A 
does not help here. 


7.2. Explanation of the computing bases in the fundamental sub- 
spaces. So, why do the above methods indeed give us bases in the funda- 
mental subspaces? 


7.2.1. The null space Ker A. The case of the null space Ker A is probably 
the simplest one: since we solved the equation Ax = 0, i.e. found all the 
solutions, then any vector in Ker A is a linear combination of the vectors we 
obtained. Thus, the vectors we obtained form a spanning system in Ker A. 
To see that the system is linearly independent, let us multiply each vector 
by the corresponding free variable and add everything, see (7.1). Then for 
each free variable x,, the entry number & of the resulting vector is exactly 
Xp, see (7.1) again, so the only way this vector (the linear combination) can 
be 0 is when all free variables are 0. 


7.2.2. The column space Ran A. Let us now explain why the method for 
finding a basis in the column space Ran A works. First of all, notice that 
the pivot columns of the reduced echelon form A;e of A form a basis in 
Ran A,, (not in the column space of the original matrix, but of its reduced 
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echelon form!). Since row operations are just left multiplications by invert- 
ible matrices, they do not change linear independence. Therefore, the pivot 
columns of the original matrix A are linearly independent. 


Let us now show that the pivot columns of A span the column space 
of A. Let vi, v2,...,Vv,r be the pivot columns of A, and let v be an arbi- 
trary column of A. We want to show that v can be represented as a linear 
combination of the pivot columns vj, v9,..., Vr, 


V=aQa,Vvi + QeqV2o+...+ArVr. 


The reduced echelon form Aye is obtained from A by the left multiplication 


Are =E A, 
where F is a product of elementary matrices, so F is an invertible matrix. 
The vectors Ev,, Evo,..., Hv, are the pivot columns of A;., and the column 


v of A is transformed to the column Ev of A;e. Since the pivot columns 
of Are form a basis in Ran Aye, vector Ev can be represented as a linear 
combination 


Ev = aq, Ev, + agEvo+...+a,Ev,r. 


Multiplying this equality by E~1 from the left we get the representation 


V=Q1{V, + Q2V2 +...+ OrVp, 
so indeed the pivot columns of A span Ran A. 
7.2.3. The row space Ran A’. It is easy to see that the pivot rows of the 
echelon form A, of A are linearly independent. Indeed, let wi, w2,..., Ww, 


be the transposed (since we agreed always to put vectors vertically) pivot 
rows of Ae. Suppose 


Qayw, +a2gwo+...+a;w, = 0. 


Consider the first non-zero entry of wi. Since for all other vectors 
W2,W3,--.,W, the corresponding entries equal 0 (by the definition of eche- 
lon form), we can conclude that a, = 0. So we can just ignore the first term 
in the sum. 


Consider now the first non-zero entry of w2. The corresponding entries 


of the vectors w3,...,Ww, are 0, so ag = 0. Repeating this procedure, we 
get that a, = 0 Vk =1,2,...,r. 
To see that vectors W1,W2,...,W, Span the row space, one can notice 


that row operations do not change the row space. This can be obtained 
directly from analyzing row operations, but we present here a more formal 
way to demonstrate this fact. 

For a transformation A and a set X let us denote by A(X) the set of all 
elements y which can represented as y = A(x), x € X, 


A(X) := {y = A(x): 4 € X}. 
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If A is an m X n matrix, and A¢ is its echelon form, Ag is obtained from 
A be left multiplication 
A, = EA, 
where £ is an m x m invertible matrix (the product of the corresponding 
elementary matrices). Then 


Ran A? = Ran(A’ E") = A’ (Ran E’) = A’(R™) = Ran A’, 
so indeed Ran A? = Ran AP. 


7.3. The Rank Theorem. Dimensions of fundamental subspaces. 
There are many applications in which one needs to find a basis in column 
space or in the null space of a matrix. For example, as it was shown above, 
solving a homogeneous equation Ax = O amounts to finding a basis in 
the null space Ker A. Finding a basis in the column space means simply 
extracting a basis from a spanning set, by removing unnecessary vectors 
(columns). 


However, the most important application of the above methods of com- 
puting bases of fundamental subspaces is the relations between their dimen- 
sions. 

Theorem 7.1 (The Rank Theorem). For a matrix A 
rank A = rank A’. 


This theorem is often stated as follows: 


The column rank of a matrix coincides with its row rank. 


The proof of this theorem is trivial, since dimensions of both Ran A and 
Ran A? are equal to the number of pivots in the echelon form of A. 
The following theorem is gives us important relations between dimen- 


sions of the fundamental spaces. It is often also called the Rank Theorem 


Theorem 7.2. Let A be anm xn matriz, i.e. a linear transformation from 
F" toF™. Then 
1. dim Ker A+ dim Ran A = dim Ker A+rank A = n (dimension of the 
domain of A); 
2. dim Ker A? + dim Ran A? = dim Ker A? + rank A? = 
dim Ker A? + rank A = m (dimension of the target space of A); 


Proof. The proof, modulo the above algorithms of finding bases in the 
fundamental subspaces, is almost trivial. The first statement is simply the 
fact that the number of free variables (dim Ker A) plus the number of basic 
variables (i.e. the number of pivots, i.e. rank A) adds up to the number of 
columns (i.e. to n). 
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The second statement, if one takes into account that rank A = rank AT 
is simply the first statement applied to A’. 


As an application of the above theorem, let us recall the example from 
Section 6. There we considered a system 


23 14 -9 17 
1111 -8 = 6 
a oe eae ee eso 
222 3 -8 14 
and we claimed that its general solution given by 
3 —2 2 
1 1 —1 
x= 0 + 23 1 +25 0 5 13,05 € F, 
2 0 2 
0 0 1 
or by 
3 —2 0 
1 1 0 
x=] 0 ]+s 1 +t] 1 : s,teF. 
2 0 2 
0 0 1 


We checked in Section 6 that a vector x given by either formula is indeed 
a solution of the equation. But, how can we guarantee that any of the 
formulas describe all solutions? 


First of all, we know that in either formula, the last 2 vectors (the ones 
multiplied by the parameters) belong to Ker A. It is easy to see that in 
either case both vectors are linearly independent (two vectors are linearly 
dependent if and only if one is a multiple of the other). 


Now, let us count dimensions: interchanging the first and the second 
rows and performing first round of row operations 


1111 -3 1111-3 
-2R;/ 23 14 -9 01-1 2 -3 
-R,;} 1112 -5 + 00 0 1 -2 
-2R,\2 2 2 3 -8 16 0 1 = 


we see that there are three pivots already, so rank A > 3. (Actually, we 
already can see that the rank is 3, but it is enough just to have the estimate 
here). By Theorem 7.2, rank A + dim Ker A = 5, hence dim Ker A < 2, and 
therefore there cannot be more than 2 linearly independent vectors in Ker A. 
Therefore, last 2 vectors in either formula form a basis in Ker A, so either 
formula give all solutions of the equation. 
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An important corollary of the rank theorem, is the following theorem 
connecting existence and uniqueness for linear equations. 


Theorem 7.3. Let A be anm x n matrix. Then the equation 
Ax =b 

has a solution for every b € R™ if and only if the dual equation 
A’x=0 


has a unique (only the trivial) solution. (Note, that in the second equation 
we have A’, not A). 


Proof. The proof follows immediately from Theorem 7.2 by counting the 
dimensions. We leave the details as an exercise to the reader. 


There is a very nice geometric interpretation of the second rank the- 
orem (Theorem 7.2). Namely, statement 1 of the theorem says, that if a 
transformation A : F” + F™ has trivial kernel (Ker A = {0}), then the 
dimensions of the domain F” and of the range Ran A coincide. If the ker- 
nel is non-trivial, then the transformation “kills” dim Ker A dimensions, so 
dim Ran A = n — dim Ker A. 


7.4. Completion of a linearly independent system to a basis. As 
Proposition 5.4 from Section 5 above asserts, any linearly independent sys- 
tem can be completed to a basis, i-e. given inearly independent vectors 


V1, V2,...,V,r in a finite-dimensional vector space V, one can find vectors 
Vr+1; Vr42---+;Vn Such that the system of vectors v1, Vv2,.-.,Vn is a basis in 
V. 


Theoretically, the proof of this proposition give us an algorithm of finding 
the vectors V;r+41, Vr+2---,Wn, but this algorithm does not look too practical. 


Ideas of this section give us a more practical way to perform the com- 
pletion to a basis. 


First of all, notice that if an m x n matrix is in an echelon form, then its 
non-zero rows (which are clearly linearly independent) can be easily com- 
pleted to a basis in the whole space F"; one just needs to add some rows in 
appropriate places, so the resulting matrix is still in the echelon form and 
has pivots in every column. 


Then, the non-zero rows of the new matrix form a basis, and we can 
order it any way we want, because property of being basis does not depend 
on the ordering. 


Suppose now that we have linearly independent vectors vi, v2,...,Vr, 


vz; € F”. Consider the matrix A with rows v?,v2,...,v7 and perform row 
operations to get the echelon form A,g. As we discussed above, the rows of 
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Ae can be easily completed to a basis in R”. And it turns out that the same 
vectors that complete rows of Ae to a basis complete to a basis the original 
vectors V1, V92,..-,Vr- 

To see that, let vectors v;41,...,Vn complete the rows of A, to a basis in 
F”. Then, if we add to a matrix Ap rows A as ..., v2, we get an invertible 
matrix. Let call this matrix A and let A be the matrix obtained from A 
by adding rows Way ...,v2. The matrix Ae can be obtained from A by 
row operations, so 

A, = EA, 

where F is the product of the corresponding elementary matrices. Then 
A= E~ and A is invertible as a product of invertible matrices. 


But that means that the rows of A form a basis in F”, which is exactly 
what we need. 


Remark. The method of completion to a basis described above may be not 
the simplest one, but one of its principal advantages is that it works for 
vector spaces over an arbitrary field. 

Exercises. 

7.1. True or false: 


The rank of a matrix is equal to the number of its non-zero columns; 


a 
b) The m x n zero matrix is the only m x n matrix having rank 0; 


ie) 


Elementary row operations preserve rank; 


Q 


Elementary column operations do not necessarily preserve rank; 


The rank of a matrix is equal to the maximum number of linearly inde- 
pendent columns in the matrix; 


f) The rank of a matrix is equal to the maximum number of linearly inde- 
pendent rows in the matrix; 


g) The rank of an n x n matrix is at most n; 


h) Ann x n matrix having rank n is invertible. 


7.2. A 54 x 37 matrix has rank 31. What are dimensions of all 4 fundamental 
subspaces? 


7.3. Compute rank and find bases of all four fundamental subspaces for the matrices 


12 Be Ae a 
ee 14012 
tha pa Boro 
120 120-0 


7.4. Prove that if A: X — Y and V isasubspace of X then dim AV < rank A. (AV 
here means the subspace V transformed by the transformation A, i.e. any vector in 
AV can be represented as Av, v € V). Deduce from here that rank(AB) < rank A. 
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Remark: Here one can use the fact that if V C W then dimV < dimW. Do you 
understand why is it true? 


7.5. Prove that if A: X — Y and V is a subspace of X then dim AV < dimV. 
Deduce from here that rank(AB) < rank B. 


7.6. Prove that if the product AB of two n x n matrices is invertible, then both A 
and B are invertible. Even if you know about determinants, do not use them, we 
did not cover them yet. Hint: use previous 2 problems. 


7.7. Prove that if Ax = 0 has unique solution, then the equation A?x = b has a 
solution for every right side b. 
Hint: count pivots 


7.8. Write a matrix with the required property, or explain why no such matrix 
exists 


a) Column space contains (1,0,0)", (0,0,1)7, row space contains (1,1)", 
(1,2)7; 
b) Column space is spanned by (1,1, 1)7, nullspace is spanned by (1, 2,3)7; 


c) Column space is R*, row space is R®. 
Hint: Check first if the dimensions add up. 
7.9. If A has the same four fundamental subspaces as B, does A = B? 


7.10. Complete the rows of a matrix 


e 3 4 0 -r 6 -2 
002 -1 nr 1 1 
000 0 3 -3 2 
000 0 0 0 1 


to a basis in R®. 


7.11. For a matrix 


find bases in its column and row spaces. 


7.12. For the matrix in the previous problem, complete the basis in the row space 


to a basis in R® 
1 a 
A=( a i) 


compute Ran A and Ker A. What can you say about relation between these sub- 
spaces? 


7.13. For the matrix 


7.14. Is it possible that for a real matrix A that Ran A = Ker AT? Is it possible 
for a complex A? 


7.15. Complete the vectors (1,2,—1,2,3)7, (2,2,1,5,5)7, (-1,—-4,4,7,-11)7 to 
a basis in R®. 
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8. Representation of a linear transformation in arbitrary 
bases. Change of coordinates formula. 


The material we have learned about linear transformations and their matri- 
ces can be easily extended to transformations in abstract vector spaces with 
finite bases. In this section we will distinguish between a linear transforma- 
tion T and its matrix, the reason being that we consider different bases, so 
a linear transformation can have different matrix representation. 


8.1. Coordinate vector. Let V be a vector space with a basis B := 
{bi, b2,...,bn}. Any vector v € V admits a unique representation as a 
linear combination 


n 
v=271b, 4+ 29bo+...+2,b, = S rebe. 
k=1 
The numbers 71,%2,...,%n are called the coordinates of the vector v in 


the basis B. It is convenient to join these coordinates into the so-called 
coordinate vector of v relative to the basis B, which is the column vector 


oa 
v2 
[Vv], = : €F" 
In 
Note that the mapping 
v4 [v] 2 
is an isomorphism between V and F”. It transforms the basis bj, be,..., bn 
to the standard basis e;,e€2,...,@n in F”. 


8.2. Matrix of a linear transformation. Let T : V > W be a linear 
transformation, and let A = {a1, a2,...,an,}, B := {b1, ba,..., bm} be bases 
in V and W respectively. 

A matrix of the transformation T in (or with respect to) the bases A 
and B is an m x n matrix, denoted by [T],,,, which relates the coordinate 
vectors [Tv], and [v] ,, 

Tv], = [Tlealvl a: 
notice the balance of symbols A and B here: this is the reason we put the 
first basis A into the second position. 

The matrix [T],, , is easy to find: its kth column is just the coordinate 


vector [T’a,],, (compare this with finding the matrix of a linear transforma- 
tion from F” to F™). 
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As in the case of standard bases, composition of linear transformations 
is equivalent to multiplication of their matrices: one only has to be a bit 
more careful about bases. Namely, let T; : X — Y and Ty: Y — Z be linear 
transformation, and let A,B and C be bases in X, Y and Z respectively. 
Then for the composition T = TT}, 


T:X >Z, Tx := To(T,(x)) 
we have 


(8.1) [Tey = [TeTi]o4 = [Trlog Tila 


(notice again the balance of indices here). 


The proof here goes exactly as in the case of F” spaces with standard 
bases, so we do not repeat it here. Another possibility is to transfer every- 
thing to the spaces F” via the coordinate isomorphisms v +> [v],,. Then one 
does not need any proof, everything follows from the results about matrix 
multiplication. 


8.3. Change of coordinate matrix. Let us have two bases A = 
{aj,ao,...,a,} and B = {bj,bo,...,b,} in a vector space V. Consider 
the identity transformation J = Iy and its matrix [J],,, in these bases. By 
the definition 


BA 


We =Ug4lvla We V, 
ie. for any vector v € V the matrix [I], , transforms its coordinates in the 
basis A into coordinates in the basis B. The matrix [J], , is often called the 
change of coordinates (from the basis A to the basis B) matrix. 

The matrix [I], is easy to compute: according to the general rule of 
finding the matrix of a linear transformation, its kth column is the coordi- 
nate representation [ax], of kth element of the basis A 

Note that 

-1 
Wag = Ulp,) ’ 
(follows immediately from the multiplication of matrices rule (8.1)), so any 
change of coordinate matrix is always invertible. 
8.3.1. An example: change of coordinates from the standard basis. Let our 
space V be F”, and let us have a basis B = {b1,b2,...,bn} there. We 
also have the standard basis S = {e1,€9,..., e,} there. The change of 


coordinates matrix [I], is easy to compute: 
Uo, = [bi, b2,---, bn] =: B, 


ie. it is just the matrix B whose kth column is the vector (column) v;. And 
in the other direction 


Fae > (Hen =a 
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For example, consider a basis 


e=i(2) (i) 


in F?, and let S denote the standard basis there. Then 


a= (42) 2 


= = 1/-l 2 
Hag = Wg = B= 5 ( 2 3) 


(we know how to compute inverses, and it is also easy to check that the 


and 


above matrix is indeed the inverse of B) 
8.3.2. An example: going through the standard basis. In the space of poly- 
nomials of degree at most 1 we have bases 
A=({1,l+z2}, and B={1+4+22,1 -2z}, 
and we want to find the change of coordinate matrix [I], ,. 


Of course, we can always take vectors from the basis A and try to de- 
compose them in the basis 3; it involves solving linear systems, and we know 
how to do that. 


However, I think the following way is simpler. In P; we also have the 
standard basis S = {1,2}, and for this basis 


ete Meets ae 


and taking the inverses 


we Bt POA! Spade = bef 2e ~ FL 
Was =A =(4 1 ’ les =B ~ 4 2 -1 
Then 
i ae a 
Sey eae 
Mea =laslsa=2"4=3(3 1) (9 1) 
and Notice the balance 

-1 1 -1 1 1 of indices here. 

Hire PRU oo mare ames 


8.4. Matrix of a transformation and change of coordinates. Let 
T:V —W bea linear transformation, and let A, A be two bases in V and 
let B, B be two bases in W. Suppose we know the matrix [T] pa? and we 


would like to find the matrix representation with respect to new bases A, 


B, i.e. the matrix Tex The rule is very simple: 


Notice the balance 
of indices. 


[T]_, is often used in- 
stead of [T] , ,. It is 
shorter, but two in- 
dex notation is bet- 
ter adapted to the 
balance of indices 
rule. 
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to get the matrix in the “new” bases one has to surround the 
matrix in the “old” bases by change of coordinates matrices. 


I did not mention here what change of coordinate matrix should go where, 
because we don’t have any choice if we follow the balance of indices rule. 
Namely, matrix representation of a linear transformation changes according 
to the formula 


Thea = Wise lel] 


The proof can be done just by analyzing what each of the matrices does. 


AA 


8.5. Case of one basis: similar matrices. Let V be a vector space and 
let A = {a1,a2,...,a,} be a basis in V. Consider a linear transformation 
T:V —V and let [T] ,, be its matrix in this basis (we use the same basis 
for “inputs” and “outputs” ) 

The case when we use the same basis for “inputs” and “outputs” is 
very important (because in this case we can multiply a matrix by itself), so 
let us study this case a bit more carefully. Notice, that very often in this 
case the shorter notation [T] , is used instead of [T], ,. However, the two 
index notation [T'] , , is better adapted to the balance of indices rule, so I 
recommend using it (or at least always keep it in mind) when doing change 
of coordinates. 

Let B = {bj,b2,...,b,} be another basis in V. By the change of 
coordinate rule above 


Tee = eal aaU as 
Recalling that 
_ pyri 
Ulea = Was 
we can rewrite the above formula as 
[nla = Ona: 


This gives a motivation for the following definition 


and denoting Q := [I] ,,, 


Definition 8.1. We say that a matrix A is similar to a matrix B if there 
exists an invertible matrix Q such that A = Q-'BQ. 


Since an invertible matrix must be square, it follows from counting di- 
mensions, that similar matrices A and B have to be square and of the same 
size. If A is similar to B, i.e. if A= Q-!BQ, then 


B=QAQ™' = (Q7')'4(Q') 


(since Q7! is invertible), therefore B is similar to A. So, we can just say 
that A and B are similar. 
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The above reasoning shows, that it does not matter where to put Q 
and where Q~!: one can use the formula A = QBQ™! in the definition of 
similarity. 

The above discussion shows, that one can treat similar matrices as dif- 
ferent matrix representation of the same linear operator (transformation). 


Exercises. 
8.1. True or false 
a) Every change of coordinate matrix is square; 
b) 
c) The matrices A and B are called similar if B = Q7 AQ for some matrix Q; 
) 


d) The matrices A and B are called similar if B = Q~'AQ for some matrix 
Q; 


e) Similar matrices do not need to be square. 


Every change of coordinate matrix is invertible; 


8.2. Consider the system of vectors 
(1,2,1,1)7, (0,1,3,1)7, (0,3,2,0)7, (0,1,0,0)*. 
a) Prove that it is a basis in F4. Try to do minimal amount of computations. 


b) Find the change of coordinate matrix that changes the coordinates in this 
basis to the standard coordinates in F‘ (i.e. to the coordinates in the stan- 
dard basis e;,...,e4). 


8.3. Find the change of coordinates matrix that changes the coordinates in the 
basis 1,1-+¢ in P, to the coordinates in the basis 1 — t, 2t. 


8.4. Let T be the linear operator in F? defined (in the standard coordinates) by 
r( x ) _ ( Ba+y ) 
y a — 2y 
Find the matrix of T in the standard basis and in the basis 
(11)", 1,2)". 


8.5. Prove, that if A and B are similar matrices then trace A = trace B. Hint: 
recall how trace(XY) and trace(Y X) are related. 


(22) m= (2) 


8.6. Are the matrices 


similar? Justify. 


Chapter 8 


Determinants 


1. Introduction. 


The reader probably already met determinants in calculus or algebra, at 
least the determinants of 2 x 2 and 3 x 3 matrices. For a 2 x 2 matrix 


a b 

c d 
the determinant is simply ad — bc; the determinant of a 3 x 3 matrix can be 
found by the “Star of David” rule. 


In this chapter we would like to introduce determinants for n x n matri- 
ces. I don’t want just to give a formal definition. First I want to give some 
motivation, and then derive some properties the determinant should have. 
Then if we want to have these properties, we do not have any choice, and 
arrive to several equivalent definitions of the determinant. 

It is more convenient to start not with the determinant of a matrix, but 
with determinant of a system of vectors. There is no real difference here, 
since we always can join vectors together (say as columns) to form a matrix. 

Let us have n vectors v1, V2,...,Vn in R” (notice that the number of 
vectors coincides with dimension), and we want to find the n-dimensional 
volume of the parallelepiped determined by these vectors. 

The parallelepiped determined by the vectors v1, v2,...,Vn can be de- 
fined as the collection of all vectors v € R” that can be represented as 


v= tiv, +tevet...+tnVn, O<t,<1 Vk =1,2,...,n. 


It can be easily visualized when n = 2 (parallelogram) and n = 3 (paral- 
lelepiped). So, what is the n-dimensional volume? 
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If n = 2 it is area; if n = 3 it is indeed the volume. In dimension 1 is it 
just the length. 

Finally, let us introduce some notation. For a system of vectors (col- 
umns) Vj, V2,.-.,;Vn we will denote its determinant (that we are going to 
construct) as D(v1, v2,...,Vn). If we join these vectors in a matrix A (col- 
umn number k of A is vz), then we will use the notation det A, 


det A = D(v1,vV2,.--, Vn) 


Also, for a matrix 


Q11 G12 «+. Alyn 

G21 G22 .-. Gn 
a= ; 

Qn,1 Qn,2 sore ann 


its determinant is often is denoted by 


a1, a1,2 oats Ain 
a2,1 a2,2 sats a2.n 
Qn, Gn,2 +--+ Ann 


2. What properties determinant should have. 


We know, that for dimensions 2 and 3 “volume” of a parallelepiped is de- 
termined by the base times height rule: if we pick one vector, then height 
is the distance from this vector to the subspace spanned by the remaining 
vectors, and the base is the (n — 1)-dimensional volume of the parallelepiped 
determined by the remaining vectors. 


Now let us generalize this idea to higher dimensions. For a moment 
we do not care about how exactly to determine height and base. We will 
show, that if we assume that the base and the height satisfy some natural 
properties, then we do not have any choice, and the volume (determinant) 
is uniquely defined. 


2.1. Linearity in each argument. First of all, if we multiply vector v, 
by a positive number a, then the height (i.e. the distance to the linear span 
L(v2,--.,Vn)) is multiplied by a. If we admit negative heights (and negative 
volumes), then this property holds for all scalars a, and so the determinant 
D(v1,V2,..-,;Vn) of the system v1, V2,...,Vn should satisfy 


D(avi,V2,---;Vn) = aD(v1, V2,---,Vn)- 


2. What properties determinant should have. 77 


Of course, there is nothing special about vector vi, so for any index k 
(2.1) D(v1,.. OVE 16+; Wn) =aD(v1,... Vi abdy Vin) 
To get the next property, let us notice that if we add 2 vectors, then the 


“height” of the result should be equal the sum of the “heights” of summands, 
ie. that 


(2.2) D(vi,...,Up + VE,---5Vn) = 
“ss 
k 
ENViiona tas 2s %n) Digs Vhn ssa Me) 


In other words, the above two properties say that the determinant of n 
vectors is linear in each argument (vector), meaning that if we fix n — 1 
vectors and interpret the remaining vector as a variable (argument), we get 
a linear function. 


Remark. We already know that linearity is a very nice property, that helps 
in many situations. So, admitting negative heights (and therefore negative 
volumes) is a very small price to pay to get linearity, since we can always 
put on the absolute value afterwards. 


In fact, by admitting negative heights, we did not sacrifice anything! To 
the contrary, we even gained something, because the sign of the determinant 
contains some information about the system of vectors (orientation). 


2.2. Preservation under “column replacement”. The next property 
also seems natural. Namely, if we take a vector, say vj, and add to it a 
multiple of another vector vz, the “height” does not change, so 


k 


(2.3) D(v1,...,Vj + OVE... ., Vey ++ +5 Vn) 
= 
j 
= D(v1,---, Vis. ++) Vey es +s Vn) 
j k 
In other words, if we apply the column operation of the third type, the 
determinant does not change. 


Remark. Although it is not essential here, let us notice that the second 
part of linearity (property (2.2)) is not independent: it can be deduced from 
properties (2.1) and (2.3). 


We leave the proof as an exercise for the reader. 
2.3. Antisymmetry. The next property the determinant should have, is 
that if we interchange 2 vectors, the determinant changes sign: 


(2.4) D(W1,--65 Ves Vise Vn) = le Nalnicea iia Vines Ma) 
j k j : 


Functions of several 
variables that 
change sign when 
one interchanges 
any two arguments 
are called 
antisymmetric. 
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At first sight this property does not look natural, but it can be deduced 
from the previous ones. Namely, applying property (2.3) three times, and 
then using (2.1) we get. 


D(V1, 66-5 VGjy-0+5 Vig es 9 Vn) = 
J k 
= D(V1,. 6. V5, 606 Vie — Vinee Vn) 
= D(v ? 1Vj + (VE - vy), ->Vk— Vj; »Vn) 
———— 
j k 
_ D(v ’ Vio >Vk Vis »Vn) 
j 
k 
= D(v1, Vk; ..3 (Vk — Vj) — Vive ++ Vn) 
j 
k 
= D(V1,---5 Vk, +++) —Vjs--+; Vn) 
j k 
= —D(v1,...,Vk,- ++, Vj,--+5Vn)- 
J k 


2.4. Normalization. The last property is the easiest one. For the stan- 


dard basis e1,€2,...,€, in R” the corresponding parallelepiped is the n- 
dimensional unit cube, so 
(2.5) D(e1,€2,-..,€n) = 1. 


In matrix notation this can be written as 


det(I) =1 


3. Constructing the determinant. 


The plan of the game is now as follows: using the properties that as we 
decided in Section 2 the determinant should have, we derive other properties 
of the determinant, some of them highly non-trivial. We will show how to 
use these properties to compute the determinant using our old friend—row 
reduction. 


Later, in Section 4, we will show that the determinant, i.e. a function 
with the desired properties exists and unique. After all we have to be sure 
that the object we are computing and studying exists. 


While our initial geometric motivation for determinant and its properties 
came from considering vectors in the real vector space R”, so they relate only 
to matrices with real entries, all the constructions below use only algebraic 
operations (addition, multiplication, division) and are applicable to matrices 
with complex entries, and even with entries in an arbitrary field. 
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So in what follows we are constructing determinant not just for real 
matrices, but for complex matrices as well (and also for matrices with entries 
in an arbitrary field). The nice geometric motivation for the properties 
works only in the real case, but after we decided on the properties of the 
determinant (see properties 1-3 below) everything works in the general case. 


3.1. Basic properties. We will use the following basic properties of the 
determinant: 


1. Determinant is linear in each column, i.e. in vector notation for every 
index k 


D(v4,..-, Uz + BVE,.--, Vn) = 
ee" 
k 
DN taeeng easy Vn) BOW sa Mihi ty Yn) 


k 
for all scalars a, (. 


2. Determinant is antisymmetric, i.e. if one interchanges two columns, 
the determinant changes sign. 


3. Normalization property: det J = 1. 
All these properties were discussed above in Section 2. The first property 
is just the (2.1) and (2.2) combined. The second one is (2.4), and the last one 
is the normalization property (2.5). Note, that we did not use property (2.3): 


it can be deduced from the above three. These three properties completely 
define determinant! 


3.2. Properties of determinant deduced from the basic properties. 


Proposition 3.1. For a square matriz A the following statements hold: 


1. If A has a zero column, then det A = 0. 

2. If A has two equal columns, then det A = 0; 

3. If one column of A is a multiple of another, then det A = 0; 
4. 


If columns of A are linearly dependent, i.e. if the matrix is not in- 
vertible, then det A = 0. 


Proof. Statement 1 follows immediately from linearity. If we multiply the 
zero column by zero, we do not change the matrix and its determinant. But 
by the property 1 above, we should get 0. 


The fact that determinant is antisymmetric, implies statement 2. In- 
deed, if we interchange two equal columns, we change nothing, so the deter- 
minant remains the same. On the other hand, interchanging two columns 


Note, that adding to 
a column a multiple 
of itself is prohibited 
here. We can only 
add multiples of the 
other columns. 
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changes sign of determinant, so 
det A = — det A, 
which is possible only if det A = 0. 
Statement 3 is immediate corollary of statement 2 and linearity. 


To prove the last statement, let us first suppose that the first vector v1 
is a linear combination of the other vectors, 


n 
Vi = Gove + a3v3 +...+ GnVn = ) AkVR- 
k=2 


Then by linearity we have (in vector notation) 


n 
D(v1,V2,--+,Vn) =D ((Syes) vara.) 
k=2 


n 
a a ap D(VE, V2, V3;---; Vn) 
k=2 
and each determinant in the sum is zero because of two equal columns. 

Let us now consider general case, i.e. let us assume that the system 
V1, V2,-.-,Vn is linearly dependent. Then one of the vectors, say vz, can be 
represented as a linear combination of the others. Interchanging this vector 
with v; we arrive to the situation we just treated, so 


NMA saan Riee24 Va) = AN Mit Vite Me) =-0=0, 


so the determinant in this case is also 0. 


The next proposition generalizes property (2.3). As we already have 
said above, this property can be deduced from the three “basic” properties 
of the determinant, we are using in this section. 


Proposition 3.2. The determinant does not change if we add to a col- 
umn a linear combination of the other columns (leaving the other columns 
intact). In particular, the determinant is preserved under “column replace- 
ment” (column operation of third type). 


Proof. Fix a vector vz, and let u be a linear combination of the other 


u= ) AjVj. 


vectors, 


i-k 
Then by linearity 
D(v1,.--,Ve +U,---; Vn) = D(v1,.--,Ve,---5> Vn) + D(vi,.--,U,---5 Vn), 
—— k k 


k 
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and by Proposition 3.1 the last term is zero. 


3.3. Determinants of diagonal and triangular matrices. Now we are 
ready to compute determinant for some important special classes of matrices. 
The first class is the so-called diagonal matrices. Let us recall that a square 
matrix A = {aj,~}7j—-1 is called diagonal if all entries off the main diagonal 


are zero, ie. if aj, = 0 for all 7 # k. We will often use the notation 
diag{a1,a2,...,@n} for the diagonal matrix 
ay 0 0 
0 a... O 
27) Ma 2G 
0 O ... ay 
Since a diagonal matrix diag{a1,a2,...,@,} can be obtained from the 


identity matrix IJ by multiplying column number k by ax, 


Determinant of a diagonal matrix equal the product of the diago- 
nal entries, 


det (diag{a1,a2,...,@n}) = a1a9...dn. 


The next important class is the class of so-called triangular matrices. A 
square matrix A = {45,6} 7 j—1 is called upper triangular if all entries below 
the main diagonal are 0, i.e. if aj, = 0 for all k < 7. A square matrix is 
called lower triangular if all entries above the main are 0, i.e if aj, = 0 for 
all 7 < k. We call a matrix triangular, if it is either lower or upper triangular 
matrix. 


It is easy to see that 


Determinant of a triangular matrix equals to the product of the 
diagonal entries, 
det A = @1,192,2---Gnn- 


Indeed, if a triangular matrix has zero on the main diagonal, it is not 
invertible (this can easily be checked by column operations) and therefore 
both sides equal zero. If all diagonal entries are non-zero, then using column 
replacement (column operations of third type) one can transform the matrix 
into a diagonal one with the same diagonal entries: For upper triangular 
matrix one should first subtract appropriate multiples of the first column 
from the columns number 2,3,...,n, “killing” all entries in the first row, 
then subtract appropriate multiples of the second column from columns 
number 3,...,7, and so on. 
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To treat the case of lower triangular matrices one has to do “column 
reduction” from the left to the right, i.e. first subtract appropriate multiples 
of the last column from columns number n — 1,...,2,1, and so on. 


3.4. Computing the determinant. Now we know how to compute de- 
terminants, using their properties: one just needs to do column reduction 
(ie. row reduction for A’) keeping track of column operations changing 
the determinant. Fortunately, the most often used operation—row replace- 
ment, i.e. operation of third type does not change the determinant. So we 
only need to keep track of interchanging of columns and of multiplication of 
column by a scalar. 


If an echelon form of A? does not have pivots in every column (and 
row), then A is not invertible, so det A = 0. If A is invertible, we arrive at 
a triangular matrix, and det A is the product of diagonal entries times the 
correction from column interchanges and multiplications. 


The above algorithm implies that det A can be zero only if a matrix A 


is not invertible. Combining this with the last statement of Proposition 3.1 
we get 


Proposition 3.3. det A = 0 if and only if A is not invertible. An equivalent 
statement: det A 4 0 if and only if A is invertible. 


Note, that although we now know how to compute determinants, the 
determinant is still not defined. One can ask: why don’t we define it as 
the result we get from the above algorithm? The problem is that formally 
this result is not well defined: that means we did not prove that different 
sequences of column operations yield the same answer. 


3.5. Determinants of a transpose and of a product. Determinants 
of elementary matrices. In this section we prove two important theorems. 


Theorem 3.4 (Determinant of a transpose). For a square matrix A, 
det A = det(A’). 
This theorem implies that for all statement about columns we discussed 
above, the corresponding statements about rows are also true. In particular, 


determinants behave under row operations the same way they behave under 
column operations. So, we can use row operations to compute determinants. 


Theorem 3.5 (Determinant of a product). Forn x n matrices A and B 
det(AB) = (det A)(det B) 


In other words 


Determinant of a product equals product of determinants. 
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To prove both theorems we need the following lemma. 


Lemma 3.6. For a square matriz A and an elementary matriz E (of the 
same size) 


det(AE) = (det A)(det £) 


Proof. The proof can be done just by direct checking: determinants of 
special matrices are easy to compute; right multiplication by an elemen- 
tary matrix is a column operation, and effect of column operations on the 
determinant is well known. 

This can look like a lucky coincidence, that the determinants of elemen- 
tary matrices agree with the corresponding column operations, but it is not 
a coincidence at all. 

Namely, for a column operation the corresponding elementary matrix 
can be obtained from the identity matrix I by this column operation. So, its 
determinant is 1 (determinant of I) times the effect of the column operation. 

And that is all! It may be hard to realize at first, but the above para- 
graph is a complete and rigorous proof of the lemma! 


Applying N times Lemma 3.6 we get the following corollary. 


Corollary 3.7. For any matrix A and any sequence of elementary matrices 
E\, E2,...,EN (all matrices are n x n) 


det(AE| Ez ek En) = (det A) (det E;)(det E2) Sie (det. En) 


Lemma 3.8. Any invertible matriz is a product of elementary matrices. 


Proof. We know that any invertible matrix is row equivalent to the identity 
matrix, which is its reduced echelon form. So 
I= EnEy_,... Fo E\A, 


and therefore any invertible matrix can be represented as a product of ele- 
mentary matrices, 


AS By Ey cB gg 1 Sy io ecg ake 


(the inverse of an elementary matrix is an elementary matrix). 


Proof of Theorem 3.4. First of all, it can be easily checked, that for an 
elementary matrix E we have det E = det(E). Notice, that it is sufficient to 
prove the theorem only for invertible matrices A, since if A is not invertible 
then A” is also not invertible, and both determinants are zero. 

By Lemma 3.8 matrix A can be represented as a product of elementary 


matrices, 
A= E\E>...En, 
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and by Corollary 3.7 the determinant of A is the product of determinants 
of the elementary matrices. Since taking the transpose just transposes 
each elementary matrix and reverses their order, Corollary 3.7 implies that 
det A = det A’. 


Proof of Theorem 3.5. Let us first suppose that the matrix B is invert- 
ible. Then Lemma 3.8 implies that B can be represented as a product of 
elementary matrices 

B= E,E,...En, 
and so by Corollary 3.7 


det(AB) = (det A)[(det £1) (det £2)... (det Ey)] = (det A)(det B). 


If B is not invertible, then the product AB is also not invertible, and 
the theorem just says that 0 = 0. 

To check that the product AB = C is not invertible, let us assume that 
it is invertible. Then multiplying the identity AB = C' by C7! from the left, 
we get C-!AB = I, so C~'A is a left inverse of B. So B is left invertible, 
and since it is square, it is invertible. We got a contradiction. 


3.6. Summary of properties of determinant. First of all, let us say 
once more, that the determinant is defined only for square matrices! Since 
we now know that det A = det(A7’), the statements that we knew about 
columns are true for rows too. 


1. Determinant is linear in each row (column) when the other rows 
(columns) are fixed. 

2. If one interchanges two rows (columns) of a matrix A, the determi- 
nant changes sign. 

3. For a triangular (in particular, for a diagonal) matrix its determinant 
is the product of the diagonal entries. In particular, det J = 1. 

4. If a matrix A has a zero row (or column), det A = 0. 

5. Ifa matrix A has two equal rows (columns), det A = 0. 

6. If one of the rows (columns) of A is a linear combination of the other 
rows (columns), i.e. if the matrix is not invertible, then det A = 0; 

More generally, 

7. det A = 0 if and only if A is not invertible, or equivalently 

8. det A 4 0 if and only if A is invertible. 

9. det A does not change if we add to a row (column) a linear combi- 
nation of the other rows (columns). In particular, the determinant 


is preserved under the row (column) replacement, i.e. under the row 
(column) operation of the third kind. 
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10. det A? = det A. 


11. det(AB) = (det A)(det B). 
And finally, 


12. If A is an n x n matrix, then det(aA) = a” det A. 


The last property follows from the linearity of the determinant, if we 
recall that to multiply a matrix A by a we have to multiply each row by a, 
and that each multiplication multiplies the determinant by a. 


Exercises. 


3.1. If A is an n x n matrix, how are the determinants det A and det(5A) related? 
Remark: det(5A) = 5det A only in the trivial case of 1 x 1 matrices 


3.2. How are the determinants det A and det B related if 


a) 
a, ag a3 2a, 3a2 5az 
A= by be bg ‘ B= 2b, 3bo 5b3 ; 
Cy c2 C3 2c, 302 5¢3 
b) 
a, a2 a3 3a, 4ag+5a, 5a3 
A= bi be bg F B= 3b; 4b2+5b, 5b3 
C1 c2 C3 3c, Ac aT 5c, 5c3 


3.3. Using column or row operations compute the determinants 


01 2 12 3 BW 2o-3 
a ii oa 
See ae Ae 6 |, 
23 0 78 9 be ee ty 
23 01 


3.4. A square (n x n) matrix is called skew-symmetric (or antisymmetric) if AT = 
—A. Prove that if A is skew-symmetric and n is odd, then det A = 0. Is this true 
for even n? 


3.5. A square matrix is called nilpotent if A® = 0 for some positive integer k. Show 
that for a nilpotent matrix A det A = 0. 


3.6. Prove that if the matrices A and B are similar, than det A = det B. 


3.7. A real square matrix Q is called orthogonal if QQ = I. Prove that if Q is an 
orthogonal matrix then det Q = +1. 


3.8. Show that 


This is a particular case of the so-called Vandermonde determinant. 
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3.9. Let points A, B and C in the plane R? have coordinates (21, y1), (v2, y2) and 
(x3, y3) respectively. Show that the area of triangle ABC is the absolute value of 


yj i m1 Mm 
5| 1 v2 ye 
2 

1 %3 Ys 


Hint: use row operations and geometric interpretation of 2 x 2 determinants (area). 


3.10. Let A be a square matrix. Show that block triangular matrices 
+ a & = (: a can 
0 AS!’ 0 Li)’ x A]? x 
all have determinant equal to det A. Here * can be anything. 
The following problems illustrate the power of block matrix notation. 
3.11. Use the previous problem to show that if A and C are square matrices, then 


0 C 


wine (9 2) -(o a) (0 7) 


3.12. Let A be m x n and B be n x m matrices. Prove that 


0 A 
act ( BI ) = det(AB). 


act ( ae ) = det A det C. 


Hint: While it is possible to transform the matrix by row operations to a form 


where the determinant is easy to compute, the easiest way is to right multiply the 
I 0 


matrix by BIS): 


4. Formal definition. Existence and uniqueness of the 
determinant. 


In this section we arrive to the formal definition of the determinant. We 
show that a function, satisfying the basic properties 1, 2, 3 from Section 3 
exists, and moreover, such function is unique, i.e. we do not have any choice 
in constructing the determinant. 

Consider an n X n matrix A = {j,k} 7} p=19 and let vj, v2,...,Vn be its 
columns, i.e. 


Q1,k 


ag k n 
VE= : = a1, 7€1 + a2, ,€2 +... + On Ken = ) Qj hej. 
j=1 


Qn,k 
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Using linearity of the determinant we expand it in the first column vj: 


(4.1) D(vi,v2,..-,Vn) = 
n n 
D>” Gj,1€7;,V2,.-- ,Vn) = S> aj,1D(e;, v2, hd +Vn)- 
j=l j=l 
Then we expand it in the second column, then in the third, and so on. We 
get 


n n n 
D(™1, VO5 08% »Vn) = by S> cae » Qj, ,1Ajo,2+++ Ajn nD (Cj, -Cjo; agate ej,) 
A=1je=1  jn=l 

Notice, that we have to use a different index of summation for each column: 
we call them j1, j2,--.,jn}; the index 7; here is the same as the index j in 
(4.1). 

It is a huge sum, it contains n” terms. Fortunately, some of the terms are 
zero. Namely, if any 2 of the indices 71, j2,..., jn coincide, the determinant 
D(ej,.€j),-.-€;,,) is zero, because there are two equal columns here. 


So, let us rewrite the sum, omitting all zero terms. The most convenient 
way to do that is using the notion of a permutation. Informally, a per- 
mutation of an ordered set {1,2,...,n} is a rearrangement of its elements. 
A convenient formal way to represent such a rearrangement is by using a 
function 

a: {1,2,...,n} > {1,2,...,n}, 
where o(1),0(2),...,0(n) gives the new order of the set 1,2,...,n. In 
other words, the permutation o rearranges the ordered set 1,2,...,7 into 
a(1),0(2),...,0(n). 

Such function o has to be one-to-one (different values for different ar- 
guments) and onto (assumes all possible values from the target space). The 
functions which are one-to-one and onto are called bijections, and they give 
one-to-one correspondence between the domain and the target space.! 


Although it is not directly relevant here, let us notice, that it is well- 
known in combinatorics, that the number of different permutations of the set 
{1,2,...,n} is exactly n!. The set of all permutations of the set {1,2,...,n} 
will be denoted Perm(n). 


1 There is another canonical way to represent permutation by a bijection a, namely in this 
representation o(k) gives new position of the element number k. In this representation o rearranges 
o(1),0(2),...,0(n) into 1,2,...,n. 

While in the first representation it is easy to write the function if you know the rearrangement 
of the set 1,2,...,n, the second one is more adapted to the composition of permutations: it 
coincides with the composition of functions. Namely if we first perform the permutation that 
correspond to a function o and then one that correspond to 7, the resulting permutation will 
correspond to Too. 
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Using the notion of a permutation, we can rewrite the determinant as 


D(W1, V2,---,Vn) = 


So ag (1),1%9(2),2 «+ Ge(n),nP(€o(1) 1 €o(2)> «+1 Ean) 


o€Perm(n) 


The matrix with columns e,(1),€,(2),-++,€s(n) can be obtained from the 
identity matrix by finitely many column interchanges, so the determinant 


D(€o(1); €o(2)> ++ +1 €o(n)) 


is 1 or —1 depending on the number of column interchanges. 


To formalize that, we (informally) define the sign (denoted signa) of 
a permutation o to be 1 if an even number of interchanges is necessary to 
rearrange the n-tuple 1,2,...,n into o(1),0(2),...,a(n), and sign(a) = —1 
if the number of interchanges is odd. 

It is a well-known fact from the combinatorics, that the sign of permuta- 
tion is well defined, i.e. that although there are infinitely many ways to get 
the n-tuple o(1),0(2),...,0(n) from 1,2,...,n, the number of interchanges 
is either always odd or always even. 


One of the ways to show that is to introduce an alternative definition. 
Let K = K(o) be the number of disorders of o, i.e. the number of pairs 
(j,k), j,k € {1,2,...,n}, 7 < k such that o(j) > o(k), and see if the 
number is even or odd. We call the permutation o odd if K is odd and even 
if K is even. Then define signa := (—1)*); note that this way signo is 
well defined. 


We want to show that signa = (-1) can indeed be computed by 
rearranging the n-tuple 1,2,...,m into o(1),0(2),...,0(n) and counting the 
number of interchanges, as was described above. 

If o(k) = k Vk, then the number of disorders K(c) is 0, so sign of such 
identity permutation is 1. Note also, that any elementary transpose, which 
interchange two neighbors, changes the sign of a permutation, because it 
changes (increases or decreases) the number of disorders exactly by 1. So, 
to get from a permutation to another one always needs an even number of 
elementary transposes if the permutations have the same sign, and an odd 
number if the signs are different. 


K(o) 


Finally, any interchange of two entries can be achieved by an odd num- 
ber of elementary transposes. This implies that sign changes under an in- 
terchange of two entries. So, to get from 1,2,...,n to an even permutation 
(positive sign) one always need even number of interchanges, and odd num- 
ber of interchanges is needed to get an odd permutation (negative sign). 
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So, if we want determinant to satisfy basic properties 1-3 from Section 
3, we must define it as 


(4.2) det A = > Gg(1),1%0(2),2 +++ @o(n),n sign(c), 
o€Perm(n) 
where the sum is taken over all permutations of the set {1,2,...,n}. 


If we define the determinant this way, it is easy to check that it satisfies 
the basic properties 1-3 from Section 3. Indeed, it is linear in each column, 
because for each column every term (product) in the sum contains exactly 
one entry from this column. 


Interchanging two columns of A just adds an extra interchange to the 
permutation, so right side in (4.2) changes sign. Finally, for the identity 
matrix I, the right side of (4.2) is 1 (it has one non-zero term). 


Exercises. 
4.1. Suppose the permutation o takes (1, 2,3,4,5) to (5,4, 1, 2,3). 


Find sign of a; 

What does o? := a 00 do to (1,2,3,4,5)? 

What does the inverse permutation o~! do to (1, 2,3,4,5)? 
What is the sign of o~1? 


4.2. Let P be a permutation matriz, i.e. an n x n matrix consisting of zeroes and 
ones and such that there is exactly one 1 in every row and every column. 


a) Can you describe the corresponding linear transformation? That will ex- 
plain the name. 
b) Show that P is invertible. Can you describe P~!? 
c) Show that for some N > 0 
P’ :=PP...P=I. 
~—S 


N times 


Use the fact that there are only finitely many permutations. 


4.3. Why is there an even number of permutations of (1,2,...,9) and why are 
exactly half of them odd permutations? Hint: This problem can be hard to solve 
in terms of permutations, but there is a very simple solution using determinants. 


4.4. If o is an odd permutation, explain why o? is even but o~! is odd. 


4.5. How many multiplications and additions is required to compute the determi- 
nant using formal definition (4.2) of the determinant of an n x n matrix? Do not 
count the operations needed to compute signa. 
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5. Cofactor expansion. 


For an n x n matrix A = {aj4}7,21 let Aj, denotes the (n — 1) x (n — 1) 
matrix obtained from A by crossing out row number 7 and column number 


k. 


Theorem 5.1 (Cofactor expansion of determinant). Let A be ann xn 
matrix. For each j, 1 <j <n, determinant of A can be expanded in the 
row number j as 


det A = 
aji(—1)t! det Aji + ajo(-1)t? det Aj Fe a ajn(—1)7™ det Ajn 


n 
= 55 ajx(—1)'** det Aj... 
k=1 


Similarly, for each k, 1 < k < n, the determinant can be expanded in the 
column number k, 


det A= S° aj4.(-1)'** det Ajg. 
j=l 


Proof. Let us first prove the formula for the expansion in row number 1. 
The formula for expansion in row number 2 then can be obtained from it 
by interchanging rows number 1 and 2. Interchanging then rows number 2 
and 3 we get the formula for the expansion in row number 3, and so on. 
Since det A = det A”, column expansion follows automatically. 
Let us first consider a special case, when the first row has one non- 
zero term aj,;. Performing column operations on columns 2,3,...,n we 


transform A to the lower triangular form. The determinant of A then can 
be computed as 


the product of diagonal 
entries of the triangular | x 
matrix 


correcting factor from 
the column operations 


But the product of all diagonal entries except the first one (i.e. without 
ay,1) times the correcting factor is exactly det A1,1, so in this particular case 
det A = ait det Aj\1. 

Let us now consider the case when all entries in the first row except a1,2 
are zeroes. This case can be reduced to the previous one by interchanging 
columns number 1 and 2, and therefore in this case det A = (—1)a1,2 det Ai 2. 

The case when aj,3 is the only non-zero entry in the first row, can be 


reduced to the previous one by interchanging rows 2 and 3, so in this case 
det A = 41,3 det Aj3. 
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Repeating this procedure we get that in the case when aj1,z is the only 
non-zero entry in the first row det A = (—1)!**ay,, det Ar, 7 


In the general case, linearity of the determinant in each row implies that 


det A = det A + det A?) +... 4+ det AM = det A) 
k=1 


where the matrix A) is obtained from A by replacing all entries in the first 
row except a1,, by 0. As we just discussed above 


det A) = (-1)'*¥a,,, det Arg, 
so 


det A = S°(-1)'**ay x det Are. 
k=1 


To get the cofactor expansion in the second row, we can interchange 
the first and second rows and apply the above formula. The row exchange 
changes the sign, so we get 


n n 


det A = — S°(-1)!**apx det Aon = $>(-1)?*a2, det Aae. 
k=1 k=1 


Exchanging rows 3 and 2 and expanding in the second row we get formula 
n 
det A = S°(-1)***a3,x det Az,x, 
k=1 


and so on. 


To expand the determinant det A in a column one need to apply the row 
expansion formula for A’. 


Definition. The numbers 
Cin = (-1)°** det Aj 


are called cofactors. 


2In the case when a,x is the only non-zero entry in the first row it may be tempting to 
exchange columns number | and number k, to reduce the problem to the case a1,1 4 0. However, 
when we exchange columns 1 and k we change the order of other columns: if we just cross out 
column number k, then column number 1 will be the first of the remaining columns. But, if 
we exchange columns 1 and k, and then cross out column k (which is now the first one), then 
the column 1 will be now column number k — 1. To avoid the complications of keeping track of 
the order of columns, we can, as we did above, exchange columns number k and k — 1, reducing 
everything to the situation we treated on the previous step. Such an operation does not change 
the order for the rest of the columns. 


Very often the 
cofactor expansion 
formula is used as 
the definition of 
determinant. 
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Using this notation, the formula for expansion of the determinant in the 
row number j can be rewritten as 


n 
det A = ajiCj + aj2Cj2 + ...+ Aj nCjn = x Aj RC K- 
k=1 
Similarly, expansion in the column number & can be written as 


n 
det A = a1kC1b + a2,KC'2,k a ete An,kCn,k = S> aj hCik 
j=1 
Remark. Very often the cofactor expansion formula is used as the definition 
of determinant. It is not difficult to show that the quantity given by this 
formula satisfies the basic properties of the determinant: the normalization 
property is trivial, the proof of antisymmetry is easy. However, the proof of 
linearity is a bit tedious (although not too difficult). 


Remark. Although it looks very nice, the cofactor expansion formula is not 
suitable for computing determinant of matrices bigger than 3 x 3. 


As one can count it requires more than n! multiplications (to be precise it 
requires 7,5 n!/k! multiplications), and n! grows very rapidly. For exam- 
ple, cofactor expansion of a 20 x 20 matrix require more than 20! ~ 2.4-10!8 
multiplications. It would take a computer performing a billion multiplica- 
tions per second over 77 years to perform 20! multiplications; performing 
the multiplications required for the cofactor expansion of the determinant 
of a 20 x 20 matrix will require more than 132 years.° 


On the other hand, computing the determinant of an n x n matrix using 
row reduction requires (n* + 2n — 3)/3 multiplications (and about the same 
number of additions). It would take a computer performing a million oper- 
ations per second (very slow, by today’s standards) a fraction of a second 
to compute the determinant of a 100 x 100 matrix by row reduction. 


It can only be practical to apply the cofactor expansion formula in higher 
dimensions if a row (or a column) has a lot of zero entries. 


However, the cofactor expansion formula is of great theoretical impor- 
tance, as the next section shows. 


5.1. Cofactor formula for the inverse matrix. The matrix C = 
{Cj,c}4=1 whose entries are cofactors of a given matrix A is called the 
cofactor matriaz of A. 


Theorem 5.2. Let A be an invertible matrix and let C be its cofactor matric. 


Then ‘ 
Al= GT. 
det A 


3The reader can check the numbers sung, for example, WolframAlpha 
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Proof. Let us find the product AC’. The diagonal entry number j is 
obtained by multiplying jth row of A by jth column of A (i.e. jth row of 
C), so 


(AC*) 5,5 = 45,101 + 42Cja + --. + ajnCjn = det A, 
by the cofactor expansion formula. 


To get the off diagonal terms we need to multiply kth row of A by jth 
column of C’, j £k, to get 


ar aCj,r + ap,2C%,2 a ee 2 Ak nCjn- 


It follows from the cofactor expansions formula (expanding in jth row) that 
this is the determinant of the matrix obtained from A by replacing row 
number j by the row number & (and leaving all other rows as they were). 
But the rows j and k of this matrix coincide, so the determinant is 0. So, all 
off-diagonal entries of ACT are zeroes (and all diagonal ones equal det A), 
thus 
ACT = (det A) I. 

That means that the matrix ae CT is aright inverse of A, and since A is 
square, it is the inverse. 


Recalling that for an invertible matrix A the equation Ax = b has a 
unique solution 


we get the following corollary of the above theorem. 


Corollary 5.3 (Cramer’s rule). For an invertible matrix A the entry number 
k of the solution of the equation Ax = b is given by the formula 
a det By 
vk “det A’ 
where the matrix By is obtained from A by replacing column number k of A 
by the vector b. 


5.2. Some applications of the cofactor formula for the inverse. 


Example (Inverting 2 x 2 matrices). The cofactor formula really shines 
when one needs to invert a 2 x 2 matrix 


a=(¢ ape 


The cofactors are just entries (1 x 1 matrices), the cofactor matrix is 


(3. 
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so the inverse matrix A~! is given by the formula 


1 d —b 
4 = 
: -aa( 4. pe 


While the cofactor formula for the inverse does not look practical for 
dimensions higher than 3, it has a great theoretical value, as the examples 
below illustrate. 


Example (Matrix with integer inverse). Suppose that we want to construct 
a matrix A with integer entries, such that its inverse also has integer entries 
(inverting such a matrix would make a nice homework problem: no messing 
with fractions). If det A = 1 and its entries are integer, the cofactor formula 
for inverses implies that A~‘ also have integer entries. 


Note, that it is easy to construct an integer matrix A with det A = 1: 
one should start with a triangular matrix with 1 on the main diagonal, and 
then apply several row or column replacements (operations of the third type) 
to make the matrix look generic. 


Example (Inverse of a polynomial matrix). Another example is to consider 
a polynomial matrix A(x), i.e. a matrix whose entries are not numbers but 
polynomials a;,(a) of the variable x. If det A(x) = 1, then the inverse 
matrix A7!(2) is also a polynomial matrix. 

If det A(x) = p(x) # 0, it follows from the cofactor expansion that p(x) 
is a polynomial, so A~!(a) has rational entries: moreover, p(«) is a multiple 
of each denominator. 


Exercises. 
5.1. Evaluate the determinants using any method 


1 =2 3-12 


: : zs -5 12 -14 19 
aa -9 22 -20 31 


-4 9 —-14 15 


5.2. Use row (column) expansion to evaluate the determinants. Note, that you 
don’t need to use the first row (column): picking row (column) with many zeroes 
will simplify your calculations. 


ees do SGis8o A 
2. be OA 30 
Td 88s] 5 ¢ ¢ 
ies Gee Sit 8 
ee ees ae 
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5.3. For the n x n matrix 


0 0 0 0 ao 

-1 0 0 0 ay 

0 -1 0 0 az 
A= 

0 O 0... O  an—2 

0 O O0O ... -1 an—1 


compute det(A + tl), where I is n x n identity matrix. You should get a nice ex- 
pression involving ao, d@1,...,@,—1 and t. Row expansion and induction is probably 
the best way to go. 


5.4. Using cofactor formula compute inverses of the matrices 


Cs :) er 3) (ae) ik 
3.4 a ee 305 Goh 


5.5. Let D,, be the determinant of the n x n tridiagonal matrix 


1 -1 0 
1 1 -1 
a 
1 =1 
0 1 1 


Using cofactor expansion show that D, = Dn—1 + Dn—2. This yields that the 
sequence D,, is the Fibonacci sequence 1, 2,3,5,8,13,21,... 


5.6. Vandermonde determinant revisited. Our goal is to prove the formula 


1 © or we CG 
la«o... G 
= [J @-s) 
O<j<k<n 
Tiegh 


for the (n+ 1) x (n+ 1) Vandermonde determinant. 
We will apply induction. To do this 


a) Check that the formula holds for n = 1, n = 2. 

b) Call the variable c,, in the last row x, and show that the determinant is a 
polynomial of degree n, Aj + Aix+ Aor? +...+An2”, with the coefficients 
A, depending on co, ¢1,.--,Cn—1- 

c) Show that the polynomial has zeroes at 7 = ¢9,C1,.--,;Cn—1, 80 it can be 
represented as A, - (a — co)(w@ — c1)...(@ — Cn-1), where Ay as above. 

d) Assuming that the formula for the Vandermonde determinant is true for 
n—1, compute A, and prove the formula for n. 


5.7. How many multiplication is needed to compute the determinant of an n x n 
matrix using the cofactor expansion? Prove the formula. 
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6. Minors and rank. 


For a matrix A let us consider its k x k submatrix, obtained by taking k rows 
and k columns. The determinant of this matrix is called a minor of order 
k. Note, that an m x n matrix has (@) . () different k x k submatrices, and 
so it has (7) - (1) minors of order k. 

Theorem 6.1. For a non-zero matriz A its rank equals to the maximal 
integer k such that there exists a non-zero minor of order k. 


Proof. Let us first show, that if k > rank A then all minors of order k are 0. 
Indeed, since the dimension of the column space Ran A is rank A < k, any 
k columns of A are linearly dependent. Therefore, for any k x k submatrix 
of A its columns are linearly dependent, and so all minors of order k are 0. 


To complete the proof we need to show that there exists a non-zero 
minor of order k = rank A. There can be many such minors, but probably 
the easiest way to get such a minor is to take pivot rows and pivot columns 
(i.e. rows and columns of the original matrix, containing a pivot). This 
k x k submatrix has the same pivots as the original matrix, so it is invertible 


(pivot in every column and every row) and its determinant is non-zero. 


This theorem does not look very useful, because it is much easier to 
perform row reduction than to compute all minors. However, it is of great 
theoretical importance, as the following corollary shows. 


Corollary 6.2. Let A= A(x) be anm xn polynomial matria (i.e. a matrix 
whose entries are polynomials of x). Then rank A(x) is constant everywhere, 
except maybe finitely many points, where the rank is smaller. 


Proof. Let r be the largest integer such that rank A(2) =r for some x. To 
show that such r exists, we first try r = min{m,n}. If there exists x such 
that rank A(z) = r, we have found r. If not, we replace r by r — 1 and try 
again. After finitely many steps we either stop or hit 0. So, r exists. 


Let vo be a point such that rank A(vo) =r, and let M be a minor of order 
k such that M(ao) 4 0. Since M(x) is the determinant of a k x k polynomial 
matrix, M(ax) is a polynomial. Since M() 4 0, it is not identically zero, 
so it can be zero only at finitely many points. So, everywhere except maybe 
finitely many points rank A(x) > r. But by the definition of r, rank A(x) <r 
for all x. 


7. Review exercises for Chapter 3. 


7.1. True or false 
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a) Determinant is only defined for square matrices. 
If two rows or columns of A are identical, then det A = 0. 


c) If B is the matrix obtained from A by interchanging two rows (or columns), 
then det B = det A. 


d) If B is the matrix obtained from A by multiplying a row (column) of A by 
a scalar a, then det B = det A. 


e) If B is the matrix obtained from A by adding a multiple of a row to some 
other row, then det B = det A. 


f) The determinant of a triangular matrix is the product of its diagonal en- 
tries. 


g) det(A’) = —det(A). 

h) det(AB) = det(A) det(B). 

i) A matrix A is invertible if and only if det A F 0. 

j) If A is an invertible matrix, then det(A~!) = 1/ det(A). 


7.2. Let A be an n x n matrix. How are det(3A), det(—A) and det(A?) related to 
det A. 


7.3. If the entries of both A and A7! are integers, is it possible that det A = 3? 
Hint: what is det(A) det(A7!)? 


7.4. Let v1, v2 be vectors in R? and let A be the 2 x 2 matrix with columns vj, vo. 
Prove that |det A] is the area of the parallelogram with two sides given by the 
vectors Vj, V2. 

Consider first the case when v1 = (x1, 0)". To treat general case v1 = (21, 91)" 


left multiply A by a rotation matrix that transforms vector v, into (%1,0)?. Hint: 
what is the determinant of a rotation matrix? 


The following problem illustrates relation between the sign of the determinant 
and the so-called orientation of a system of vectors. 


7.5. Let vi, v2 be vectors in R?. Show that D(vi,v2) > 0 if and only if there 
exists a rotation Ty such that the vector T,vi is parallel to e; (and looking in the 
same direction), and T,v2 is in the upper half-plane x2 > 0 (the same half-plane 
as eo). 


Hint: What is the determinant of a rotation matrix? 


Chapter 4 


Introduction to 
spectral theory 
(eigenvalues and 
eigenvectors) 


Spectral theory is the main tool that helps us to understand the structure 
of a linear operator. In this chapter we consider only operators acting from 
a vector space to itself (or, equivalently, n x n matrices). If we have such 
a linear transformation A: V — V, we can multiply it by itself, take any 
power of it, or any polynomial. 

The main idea of spectral theory is to split the operator into simple 
blocks and analyze each block separately. 


To explain the main idea, let us consider difference equations. Many 
processes can be described by the equations of the following type 


Xn41 = AXn, n=0,1,2,..., 


where A: V - V is a linear transformation, and x, is the state of the 
system at the time n. Given the initial state x9 we would like to know the 
state x, at the time n, analyze the long time behavior of xn, etc. ! 


The difference equations are discrete time analogues of the differential equation x’(t) = 
Ax(t). To solve the differential equation, one needs to compute eA := oa tk A" /k!, and 
spectral theory also helps in doing this. 
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At the first glance the problem looks trivial: the solution x, is given by 
the formula x, = A”xo. But what if n is huge: thousands, millions? Or 
what if we want to analyze the behavior of x, as n — 00? 


Here the idea of eigenvalues and eigenvectors comes in. Suppose that 
Ax = Axo, where A is some scalar. Then A?x9 = \?x9, A®xo9 = A®xo,..., 
A”x9 = A"xo, so the behavior of the solution is very well understood 

In this section we will consider only operators in finite-dimensional spac- 
es. Spectral theory in infinitely many dimensions is significantly more com- 
plicated, and most of the results presented here fail in infinite-dimensional 
setting. 


1. Main definitions 


1.1. Eigenvalues, eigenvectors, spectrum. A scalar » is called an 
eigenvalue of an operator A : V — V if there exists a non-zero vector 
v © V such that 
Av = WV. 

The vector v is called the eigenvector of A (corresponding to the eigenvalue 
d). 

If we know that \ is an eigenvalue, the eigenvectors are easy to find: one 
just has to solve the equation Ax = Ax, or, equivalently 


(A — AI)x = 0. 
So, finding all eigenvectors, corresponding to an eigenvalue A is simply find- 
ing the nullspace of A — AI. The nullspace Ker(A — XJ), i.e. the set of all 
eigenvectors and 0 vector, is called the eigenspace. 


The set of all eigenvalues of an operator A is called spectrum of A, and 
is usually denoted o(A). 


1.2. Finding eigenvalues: characteristic polynomials. A scalar \ is 
an eigenvalue if and only if the nullspace Ker(A — AJ) is non-trivial (so the 
equation (A — \J)x = 0 has a non-trivial solution). 

Let A act on F” (ie. A: F” > F”). Since the matrix of A is square, 
A-— XI has a non-trivial nullspace if and only if it is not invertible. We 
know that a square matrix is not invertible if and only if its determinant is 
0. Therefore 


 € o(A), ie. \ is an eigenvalue of A <= det(A— AI) =0 


If A is an nm x n matrix, the determinant det(A — XJ) is a polynomial of 
degree n of the variable 4. This polynomial is called the characteristic 
polynomial of A. So, to find all eigenvalues of A one just needs to compute 
the characteristic polynomial and find all its roots. 
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This method of finding the spectrum of an operator is not very practical 
in higher dimensions. Finding roots of a polynomial of high degree can 
be a very difficult problem, and it is impossible to solve the equation of 
degree higher than 4 in radicals. So, in higher dimensions different numerical 
methods of finding eigenvalues and eigenvectors are used. 


1.3. Finding characteristic polynomial and eigenvalues of an ab- 
stract operator. So we know how to find the spectrum of a matrix. But 
how do we find eigenvalues of an operator acting in an abstract vector space? 
The recipe is simple: 


Take an arbitrary basis, and compute eigenvalues of the matrix of 
the operator in this basis. 


But how do we know that the result does not depend on a choice of the 
basis? 

There can be several possible explanations. One is based on the notion 
of similar matrices. Let us recall that square matrices A and B are called 
similar if there exist an invertible matrix S such that 


A= SBS". 
Note, that determinants of similar matrices coincide. Indeed 
det A = det(SBS~+) = det S det Bdet S~! = det B 
because det S~! = 1/det S$. Note that if A = SBS7! then 
A-dI=SBS"!-—)SIs-! = §(BS* — dI1S“1) = s(B- ADs", 
so the matrices A — AJ and B — XI are similar. Therefore 
det(A — AI) = det(B — AI), 


i.e. 


characteristic polynomials of similar matrices coincide. 


If 7: V — V is a linear transformation, and A and B are two bases in 
V, then 


[Tas = ZlaglT lalla 
and since [J], , = ([J] Pe the matrices [T] , , and [T],,, are similar. 

In other words, matrices of a linear transformation in different bases are 
similar. 

Therefore, we can define the characteristic polynomial of an operator 
as the characteristic polynomial of its matrix in some basis. As we have 
discussed above, the result does not depend on the choice of the basis, so 
characteristic polynomial of an operator is well defined. 
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1.4. Complex vs real spaces. The fundamental theorem of algebra as- 
serts that any polynomial (of degree at least 1) has a complex root. That 
implies that an operator in a finite-dimensional complex vector space has at 
least one eigenvalue, so its spectrum is non-empty. 


On the other hand it is easy to construct a linear transformation in a 
real vector space without real eigenvalues, the rotation Ra, a 4 mn in R? 
being one of examples. Since it is usually assumed that eigenvalues should 
belong to the field of scalars (if an operator acts in a vector space over a field 
F the eigenvalues should be in F), such operators have empty spectrum. 


Thus, the complex case (i.e. operators acting in complex vector spaces) 
seems to be the most natural setting for the spectral theory. Since R Cc C, 
we can always treat a real n x n matrix as an operator in C” to allow 
complex eigenvalues. Treating real matrices as operators in C” is typical in 
the spectral theory, and we will follow this agreement. Finding eigenvalues 
of a matrix (unless otherwise specified) will always mean finding all complex 
eigenvalues and not restricting oneself only to real ones. 


Note that an operator in an abstract real vector space also can be in- 
terpreted as an operator in a complex space. A naive approach would be 
to fix a basis (recall that all spaces in this chapter are finite-dimensional), 
and then work with coordinates in this basis allowing complex coordinates: 
that will be essentially move from operators in R” to operators C” described 
above. 


This construction describes what is known as the complezification of a 
real vector space, and the result does not depend on the choice of a basis. A 
“high brow” abstract construction of the complexification, explaining why 
the result does not depend on the choice of a basis is described below in 
Section 8.2 of Chapter 5. 


1.5. Multiplicities of eigenvalues. Let us remind the reader, that if p is 
a polynomial, and 4 is its root (i.e. p(A) = 0) then z — A divides p(z), i.e. p 
can be represented as p(z) = (z — A)q(z), where qg is some polynomial. If 
q(A) = 0, then q also can be divided by z — 4, so (z — A)? divides p and so 
on. 

The largest positive integer k such that (z — )* divides p(z) is called 
the multiplicity of the root A. 

If A is an eigenvalue of an operator (matrix) A, then it is a root of the 
characteristic polynomial p(z) = det(A — zI). The multiplicity of this root 
is called the (algebraic) multiplicity of the eigenvalue A. 

Any polynomial p(z) = Sf_» anz* of degree n has exactly n complex 
roots, counting multiplicity. The words counting multiplicities mean that if 
a root has multiplicity d we have to list (count) it d times. In other words, 
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p can be represented as 
p(z) = an(z — A1)(z — Az)... (2 — An). 
where Xj, A2,...,An are its complex roots, counting multiplicities. 


There is another notion of multiplicity of an eigenvalue: the dimension of 
the eigenspace Ker(A — AZ) is called geometric multiplicity of the eigenvalue 
Xr. 


Geometric multiplicity is not as widely used as algebraic multiplicity. 
So, when people say simply “multiplicity” they usually mean algebraic mul- 
tiplicity. 

Let us mention, that algebraic and geometric multiplicities of an eigen- 
value can differ. 


Proposition 1.1. Geometric multiplicity of an eigenvalue cannot exceed its 
algebraic multiplicity. 


Proof. See Exercise 1.9 below. 


1.6. Trace and determinant. 


Theorem 1.2. Let A be nxn matrix, and let \1, X2,...,An be its (complex) 
eigenvalues (counting multiplicities). Then 

1. trace A = Ay +AQ+...+An- 

2. det A = Ayr2 see Ane 


Proof. See Exercises 1.10, 1.11 below. 


1.7. Eigenvalues of a triangular matrix. Computing eigenvalues is 
equivalent to finding roots of a characteristic polynomial of a matrix (or 
using some numerical method), which can be quite time consuming. How- 
ever, there is one particular case, when we can just read eigenvalues off the 
matrix. Namely 


eigenvalues of a triangular matrix (counting multiplicities) are ex- 
actly the diagonal entries a1,1, 42,2, ...,@nn 


By triangular here we mean either upper or lower triangular matrix. 
Since a diagonal matrix is a particular case of a triangular matrix (it is both 
upper and lower triangular 


the eigenvalues of a diagonal matrix are its diagonal entries 


The proof of the statement about triangular matrices is trivial: we need 
to subtract from the diagonal entries of A, and use the fact that deter- 
minant of a triangular matrix is the product of its diagonal entries. We get 
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the characteristic polynomial 
det(A Al) = (a11 d)(a2,2 d) ice (Qnn - d) 


and its roots are exactly a1,1, @2,2,..-,4n,n- 


Exercises. 

1.1. True or false: 

a) Every linear operator in an n-dimensional vector space has n distinct eigen- 
values; 

b) If a matrix has one eigenvector, it has infinitely many eigenvectors; 

c) There exists a square real matrix with no real eigenvalues; 

d) There exists a square matrix with no (complex) eigenvectors; 

Similar matrices always have the same eigenvalues; 

Similar matrices always have the same eigenvectors; 


A non-zero sum of two eigenvectors of a matrix A is always an eigenvector; 


=m 0 tr oO 


A non-zero sum of two eigenvectors of a matrix A corresponding to the 
same eigenvalue \ is always an eigenvector. 


1.2. Find characteristic polynomials, eigenvalues and eigenvectors of the following 
matrices: 


4 —-5 2 1 te 
2 -3 }? -1 4}? =e Soe 
3 3 #1 
1.3. Compute eigenvalues and eigenvectors of the rotation matrix 
cosa —sina 
sina cosa }° 
Note, that the eigenvalues (and eigenvectors) do not need to be real. 


1.4. Compute characteristic polynomials and eigenvalues of the following matrices: 


1 2 5 67 2. 1, 0. ~2 4 0 0 0 
02 3 6 O nm 43 2 1 3 0 0 
00 -2 5 |’ 0 0 16 #1 |’ 2 4 e O |’ 
00 0 8 0 0 O 54 3°03 1 «21 
4 0 0 0 
1 0 0 0 
2 4 0 0 
3°93 141 


Do not expand the characteristic polynomials, leave them as products. 


1.5. Prove that eigenvalues (counting multiplicities) of a triangular matrix coincide 
with its diagonal entries 


1.6. An operator A is called nilpotent if A® = 0 for some k. Prove that if A is 
nilpotent, then o(A) = {0} (ie. that 0 is the only eigenvalue of A). 
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1.7. Show that characteristic polynomial of a block triangular matrix 


(3 a): 


where A and B are square matrices, coincides with det(A — AZ) det(B — AI). (Use 
Exercise 3.11 from Chapter 3). 


1.8. Let vi, V2,...,Vn be a basis in a vector space V. Assume also that the first k 
vectors V1, V2,...,V% of the basis are eigenvectors of an operator A, corresponding 
to an eigenvalue 2 (i.e. that Av; = Av;, j = 1,2,...,k). Show that in this basis 
the matrix of the operator A has block triangular form 


Al ok 
0 BS}? 
where I; is k x k identity matrix and B is some (n — k) x (n— k) matrix. 


1.9. Use the two previous exercises to prove that geometric multiplicity of an 
eigenvalue cannot exceed its algebraic multiplicity. 


1.10. Prove that determinant of a matrix A is the product of its eigenvalues (count- 
ing multiplicities). 

Hint: first show that det(A — AZ) = (Ai — A)(A2 — A)... (An — A), where 
Ai, A2,-+-;An are eigenvalues (counting multiplicities). Then compare the free 
terms (terms without A) or plug in A = 0 to get the conclusion. 


1.11. Prove that the trace of a matrix equals the sum of eigenvalues in three steps. 
First, compute the coefficient of \"~+ in the right side of the equality 


det(A — AT) = (Ar — A)(A2 — A). (An — A). 
Then show that det(A — AJ) can be represented as 
det(A — AI) = (a1,1 — A)(2,2 — A) --- (@nn — A) + 9(A) 


where q(A) is polynomial of degree at most n — 2. And finally, comparing the 
coefficients of \"~! get the conclusion. 


2. Diagonalization. 


One of the application of the spectral theory is the diagonalization of oper- 
ators, which means given an operator to find a basis in which the matrix of 
the operator is diagonal. Such basis does not always exists, i.e not all opera- 
tors can be diagonalized (are diagonalizable). Importance of diagonalizable 
operators comes from the fact that the powers, and more general function 
of diagonal matrices are easy to compute, so if we diagonalize an operator 
we can easily compute functions of it. 

We will explain how to compute functions of diagonalizable operators in 


this section. We also give a necessary and sufficient condition for an operator 
to be diagonalizable, as well as some simple sufficient conditions. 
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Note also that for operators in F” (matrices) the diagonalizability of A 
means that it can be represented as A = SDS~!, where D is a diagonal 
matrix (and S, of course, is invertible); we will explain this shortly. 

Unless otherwise specified, all results in this section hold for both com- 
plex and real vector spaces (and even for spaces over arbitrary fields). 


2.1. Preliminaries. Suppose an operator A in a vector space V is such 
that V has a basis B = by, b2,..., by, of eigenvectors of A, with 1, A2,..-,An 
being the corresponding eigenvalues. Then the matrix of A in this basis is 
the diagonal matrix with A,, A2,..., An on the diagonal 


oxi 0 


A2 


0 94, 


On the other hand, if the matrix of an operator A in a basis B = 
bj, bg,..., by is given by (2.1) then trivially Ab, = Agbgz, ie Ax, are eigen- 
values and b,x are corresponding eigenvectors. 


(2.1) [Aly = diag{A1,d2,-.., An} = 


Note that the above reasoning hods for both complex and real vector 
spaces (and even for vector spaces over arbitrary fields) 


Applying the above reasoning to operators in F” (matrices) we immedi- 
ately get the following theorem. Note, that while in this book F is either C 
or R, this theorem hods for an arbitrary field F. 


Theorem 2.1. A matriz A (with values in F) admits a representation A = 
SDS—', where D is a diagonal matrix and S is an invertible one (both with 
entries in F) if and only if there exists a basis in F” of eigenvectors of A. 


Moreover, in this case diagonal entries of D are the eigenvalues and the 
columns of S are the corresponding eigenvectors (column number k corre- 
sponds to kth diagonal entry of D). 


Proof. Let D = diag{\j, A2,...,An}, and let by, b2,...,b, be the columns 
of S (note that since S is invertible its columns form a basis in F”). Then 
the identity A = $DS~! means that D = [Alea 


Indeed, S = [J], , is the change of the coordinates matrix from B to 


the standard basis S, so we get from A = SDS~! that D = S~'AS = 
(Z], sAlZ] sz], which means exactly that D = [A], ,- 


And as we just discussed above, [A], , = D = diag{A1, A2,..-,An} if 


and only if Ay, are the eigenvalues and b, are the corresponding eigenvectors 
of A. 
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Remark. Note if a matrix admits the representation A = $DS7! with a di- 
agonal matrix D, then a simple direct calculation shows that the columns of 
S are eigenvectors of A and diagonal entries of D are corresponding eigen- 
values. This gives an alternative proof of the corresponding statement in 
Theorem 2.1. 


As we discussed above, a diagonalizable operator A: V > V has exactly 
n = dim V eigenvalues (counting multiplicities). Any operator in a complex 
vector space V has n eigenvalues (counting multiplicities); an operator in a 
real space, on the other hand, could have no real eigenvalues. 


We will, as it is customary in the spectral theory, treat real matrices as 
operators in the complex space C”, thus allowing complex eigenvalues and 
eigenvectors. Unless otherwise specified we will mean by the diagonalization 
of a matrix its complex diagonalization, i.e. a representation A = SDS7! 
where matrices S and D can have complex entries. 

The question when a real matrix admits a real diagonalization (A = 
SDS! with real matrices S and D) is in fact a very simple one, see Theorem 
2.9 below. 


2.2. Some motivations: functions of operators. Let the matrix of an 
operator A in a basis B = bj,be,..., by is a diagonal one given by (2.1). 
Then it is easy to find an Nth power of the operator A. Namely, the matrix 
of A in the basis B is 
ay 
N () 


C “« 


nm 


[AN on = diag{Aq’, Ag... AN} = 


Moreover, functions of the operator are also very easy to compute: for ex- 


A? 
ample the operator (matrix) exponent e'“ is defined as ef4 = T+tA+ =e + 
As tk Ak 
31 ——— A and its matrix in the basis B is 
! re 
Mt 
e€ 
a 
le" les = diag{e™* 2", ...,e%"*} = 


( a 


Let now A be an operator in F”. To find the matrices of the operators 
AN and e' in the standard basis S, we need to recall that the change of 
coordinate matrix [J] sg 18 the matrix with columns by, by,..., b,,. Let us 
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call this matrix S, then according to the change of coordinates formula we 


have 
M1 0 


d 
=e aa s-) = sps-1, 


) %& 


where we use D for the diagonal matrix in the middle. 
AN () 
0 \w 


Similarly 
AN=sD%’s1=8 


and similarly for e'4. 


Another way of thinking about powers (or other functions) of diagonaliz- 
able operators is to see that if operator A can be represented as A = SDS@1, 
then 

AN = (SDS~')(SDS"!)...(SDS7!) = SDN Ss“! 
—— 
N times 


and it is easy to compute the Nth power of a diagonal matrix. 


2.3. The case of n distinct eigenvalues. We now present very simple 
sufficient condition for an operator to be diagonalizable, see Corollary 2.3 
below. 


Theorem 2.2. Let 1,A2,...,A, be distinct eigenvalues of A, and let 
V1, V2,...,V, be the corresponding eigenvectors. Then vectors v1, V2,..-,Vr 
are linearly independent. 


Proof. We will use induction on r. The case r = 1 is trivial, because by 
the definition an eigenvector is non-zero, and a system consisting of one 
non-zero vector is linearly independent. 


Suppose that the statement of the theorem is true for r— 1. Suppose 


there exists a non-trivial linear combination 
r 


(2.2) CLV, + CoVg +... + OpVp = > Chev, = O. 
k=1 
Applying A — 4,J to (2.2) and using the fact that (A — A,J)v,; = 0 we 


get 
¢—l 


So ce(An — Arve = 0. 


k=1 
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By the induction hypothesis vectors v1, v2,...,Vr—1 are linearly indepen- 
dent, so cy(A, — Ar) = 0 for k = 1,2,...,r —1. Since Ay # A,r we can 
conclude that c, = 0 for k < r. Then it follows from (2.2) that c, = 0, 
i.e. we have the trivial linear combination. 


Corollary 2.3. If an operator A: V > V has exactly n = dimV distinct 
eigenvalues, then it is diagonalizable. 


Proof. For each eigenvalue A, let v;, be a corresponding eigenvector (just 
pick one eigenvector for each eigenvalue). By Theorem 2.2 the system 
V1, V2,-.-,;Vn is linearly independent, and since it consists of exactly n = 
dim V vectors it is a basis. 


2.4. Bases of subspaces (AKA direct sums of subspaces). To de- 
scribe diagonalizable operators we need to introduce some new definitions. 

Let Vi, V2,...,Vp be subspaces of a vector space V. We say that the 
system of subspaces is a basis in V if any vector v € V admits a unique 
representation as a sum 


P 
(2.3) V=vitvot+...+Vp=>_ ve, ve © Ve. 

k=1 
We also say, that a system of subspaces Vj, V2, ..., V; is linearly independent 


if the equation 
Vitvet...+Vp =0, ve E Ve 
has only trivial solution (v;, = 0 Vk =1,2,...,p). 
Another way to phrase that is to say that a system of subspaces 
V,, V2,-.., Vp is linearly independent if and only if any system of non-zero 
vectors vz, where vz € Vg, is linearly independent. 


We say that the system of subspaces Vj, V2,...,V, is generating (or 


complete, or spanning) if any vector v € V admits representation as (2.3) 
(not necessarily unique). 


Remark 2.4. From the above definition one can immediately see that The- 
orem 2.2 in fact states that the system of eigenspaces E, of an operator 
A 

Ex := Ker(A — A, J), An € o(A), 
is linearly independent. 


Remark 2.5. It is easy to see that similarly to the bases of vectors, a 
system of subspaces V;, V2,...,Vp is a basis if and only if it is generating 
and linearly independent. We leave the proof of this fact as an exercise for 
the reader. 


While very simple, 
this is a very impor- 
tant statement, and 
it will be used a lot! 
Please remember it. 
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There is a simple example of a basis of subspaces. Let V be a vector 
space with a basis v1,v2,...,Vn. Split the set of indices 1,2,...,n into 
p subsets Aj, Ag,...,A,, and define subspaces Vi, := span{v; : 7 € Ag}. 
Clearly the subspaces V;, form a basis of V. 

The following theorem shows that in the finite-dimensional case it is 


essentially the only possible example of a basis of subspaces. 


Theorem 2.6. Let Vj, V2,...,Vp be a basis of subspaces, and let us have 
in each subspace V;, a basis (of vectors) B,?. Then the union U;,By of these 
bases is a basis in V. 


To prove the theorem we need the following lemma 


Lemma 2.7. Let Vi, V2,...,Vp be a linearly independent family of subspac- 
es, and let us have in each subspace V;, a linearly independent system By, of 
vectors > Then the union B := UB, is a linearly independent system. 


Proof. The proof of the lemma is almost trivial, if one thinks a bit about 
it. The main difficulty in writing the proof is a choice of a appropriate 
notation. Instead of using two indices (one for the number & and the other 
for the number of a vector in By, let us use “flat” notation. 


Namely, let m be the number of vectors in B := U;,b,. Let us order the 
set LB, for example as follows: first list all vectors from B,, then all vectors 
in Bo, etc, listing all vectors from By, last. 

This way, we index all vectors in B by integers 1,2,...,n, and the set of 
indices {1,2,...,n} splits into the sets Aj, Ag,...,Ap such that the set By 
consists of vectors b; : 7 € Ag. 


Suppose we have a non-trivial linear combination 


(2.4) cyby + cgbg +... + enbyn = y aby =0. 
j=l 


Denote 
Vi= y cjb;. 
JEAR 


Then (2.4) can be rewritten as 


Vitvet...+Vp =O. 


2We do not list the vectors in By, one just should keep in mind that each B, consists of 
finitely many vectors in V;, 

3 Again, here we do not name each vector in By individually, we just keep in mind that each 
set 6; consists of finitely many vectors. 
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Since vz € V; and the system of subspaces V; is linearly independent, v;, = 0 
Vk. Than means that for every k 


> ejb; => 0, 
JEAk 
and since the system of vectors b; : 7 € A, (i.e. the system By) are linearly 


independent, we have cj = 0 for all 7 € Ag. Since it is true for all Ay, we 
can conclude that c; = 0 for all 7. 


Proof of Theorem 2.6. To prove the theorem we will use the same nota- 
tion as in the proof of Lemma 2.7, i.e. the system By, consists of vectors b;, 
jE Ag. 
Lemma 2.7 asserts that the system of vectors b;, 7 = 1,2,...,n is lin- 
early independent, so it only remains to show that the system is complete. 
Since the system of subspaces Vi, V2,..., Vp is a basis, any vector v € V 
can be represented as 


p 
V=vitvot...+Vp=>_ ve ve E Vp. 
k=1 
Since the vectors bj, 7 € A, form a basis in Vz, the vectors vz can be 


represented as 
VE= ) cjb;, 
JEAR 


and therefore v = }7""_, ¢jbj. 


2.5. Criterion of diagonalizability. First of all let us recall a simple 
necessary condition. Since the eigenvalues (counting multiplicities) of a di- 
agonal matrix D = diag{\1, A2,...,An} are exactly 1, A2,...,An, we see 
that if an operator A: V — V is diagonalizable, it has exactly n = dimV 
eigenvalues (counting multiplicities). 

Theorem below holds for both real and complex vector spaces (and even 
for spaces over genera fields). 


Theorem 2.8. Let an operator A: V > V has exactly n = dimV eigen- 
values (counting multiplicities)’. Then A is diagonalizable if and only if 
for each eigenvalue X the dimension of the eigenspace Ker(A — AI) (i.e. the 
geometric multiplicity of X) coincides with the algebraic multiplicity of X. 


Proof. First of all let us note, that for a diagonal matrix, the algebraic 
and geometric multiplicities of eigenvalues coincide, and therefore the same 
holds for the diagonalizable operators. 


4gince any operator in a complex vector space has exactly n eigenvalues (counting multiplic- 
ities), this assumption is moot in the complex case. 
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Let us now prove the other implication. Let 1, A2,...,Ap be eigenval- 
ues of A, and let EF, := Ker(A — AzJ) be the corresponding eigenspaces. 
According to Remark 2.4, the subspaces Ex, k = 1,2,...,p are linearly 
independent. 

Let By, be a basis in Ey. By Lemma 2.7 the system B = UzBy is a 
linearly independent system of vectors. 

We know that each 5; consists of dim E;,(= multiplicity of A;,) vectors. 
So the number of vectors in B equal to the sum of multiplicities of eigen- 
values A;. But the sum of multiplicities of the eigenvalues is the number of 
eigenvalues counting multiplicities, which is exactly n = dim V. So, we have 
a linearly independent system of n = dim V eigenvectors, which means it is 
a basis. 


2.6. Real factorization. The theorem below is, in fact, already proven (iT 
is essentially Theorem 2.8 for real spaces). We state it here to summarize 
the situation with real diagonalization of real matrices. 


Theorem 2.9. A real n x n matrix A admits a real factorization (i.e. rep- 
resentation A = SDS~! where S and D are real matrices, D is diagonal 
and S is invertible) if and only if it admits complex factorization and all 
eigenvalues of A are real. 


2.7. Some example. 


2.7.1. Real eigenvalues. Consider the matrix 


(14). 


Its characteristic polynomial is equal to 


ee a 5 
en ee 
and its roots (eigenvalues) are AX = 5 and A = —3. For the eigenvalue \ = 5 
1-5 2 -4 2 
A-st=( 8 ere an a) 


A basis in its nullspace consists of one vector (1,2)7, so this is the corre- 


sponding eigenvector. 


Similarly, for \ = —3 


4 2 
A-M=A+3I=(5 7D) 
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and the eigenspace Ker(A + 31) is spanned by the vector (1,—2)7. The 
matrix A can be diagonalized as 


a-(22)-(F 4) S)G 4) 


2.7.2. Complex eigenvalues. Consider the matrix 


s=( 3,2). 


Its characteristic polynomial is 


ne 2 


_ (4 _ \\2 4.92 
= s2y\7@ Ayr +2 


and the eigenvalues (roots of the characteristic polynomial are \ = 1 + 2i. 


For A= 14 27 
—2%i 2 
A-M= (73 es 


This matrix has rank 1, so the eigenspace Ker(A — AT) is spanned by one 
vector, for example by (1, i)”. 

Since the matrix A is real, we do not need to compute an eigenvector 
for \ = 1 — 2%: we can get it for free by taking the complex conjugate of the 
above eigenvector, see Exercise 2.2 below. So, for \ = 1— 2% a corresponding 
eigenvector is (1,—i)", and so the matrix A can be diagonalized as 


jt A 142) 0 i 
Ne ea 0: 13 pha ay 


2.7.3. A non-diagonalizable matriz. Consider the matrix 


(34) 


Its characteristic polynomial is 


ies 1 


O id |= a-ay, 


so A has an eigenvalue 1 of multiplicity 2. But, it is easy to see that 
dim Ker(A — I) = 1 (1 pivot, so 2—1 = 1 free variable). Therefore, the 
geometric multiplicity of the eigenvalue 1 is different from its algebraic mul- 
tiplicity, so A is not diagonalizable. 

There is also an explanation which does not use Theorem 2.8. Namely, 


we got that the eigenspace Ker(A — 1/) is one dimensional (spanned by the 
vector (1,0)7). If A were diagonalizable, it would have a diagonal form 
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0 1 
Therefore A cannot be diagonalized. 


( ae ) in some basis,” and so the dimension of the eigenspace wold be 2. 


Exercises. 
2.1. Let A be n x n matrix. True or false: 


a) AT™ has the same eigenvalues as A. 
b) A” has the same eigenvectors as A. 
c) If A is is diagonalizable, then so is A’. 


Justify your conclusions. 


2.2. Let A be a square matrix with real entries, and let be its complex eigenvalue. 
Suppose v = (v1, v2,---, Un)" is a corresponding eigenvector, AV = Av. Prove that 
the \ is an eigenvalue of A and AV = Av. Here ¥ is the complex conjugate of the 


vector Vv, V := (%1,02,...,Un)" 
4 3 
ante), 


2.4. Construct a matrix A with eigenvalues 1 and 3 and corresponding eigenvectors 
(1,2) and (1,1)7. Is such a matrix unique? 


2.3. Let 


Find A? by diagonalizing A. 


2.5. Diagonalize the following matrices, if possible: 
4 -2 
a) ( 1 1 ). 
-1 -l 
b) ( ae ) 


—2 2 6 
c) 5 1 -6 (A = 2 is one of the eigenvalues) 
-5 2 9 
2.6. Consider the matrix 
2 6 -6 
A={0 5 -2 
00 4 


a) Find its eigenvalues. Is it possible to find the eigenvalues without comput- 
ing? 
b) Is this matrix diagonalizable? Find out without computing anything. 


c) If the matrix is diagonalizable, diagonalize it. 


1 
5Note, that the only linear transformation having matrix 0 : ) in some basis is the 


identity transformation I. Since A is definitely not the identity, we can immediately conclude 
that A cannot be diagonalized, so counting dimension of the eigenspace is not necessary. 


2. Diagonalization. 115 


2.7. Diagonalize the matrix 


2 0 6 
02 4 
0 0 4 


2.8. Find all square roots of the matrix 


ie. find all 


ee 


matrices B such that B? = A. Hint: Finding a square root of a 


diagonal matrix is easy. You can leave your answer as a product. 


2.9. Let us recall that the famous Fibonacci sequence: 


is defined as 


0,1,1,2,3,5,8,13,21,... 
follows: we put yo = 0, y: = 1 and define 


Yn+2 = Pn+1 + Pn- 


We want to find a formula for y,,. To do this 


a) Find a 2 x 2 matrix A such that 


( Pn+2 ) =a( Pn+1 ) 
Pnt+l Pn 


Hint: Combine the trivial equation gr41 = Yn+1 with the Fibonacci 
relation Yn+42 = Pn+1+ Yn- 

b) Diagonalize A and find a formula for A”. 

c) Noticing that 


find 


Poe en (e ) =a" ( ) 
Pn Yo 0 


a formula for yp. (You will need to compute an inverse and perform 


multiplication here). 


d) Show that the vector (Yn+1/n, 1)? converges to an eigenvector of A. 
What do you think, is it a coincidence? 


2.10. Let A 
Suppose we 


diagonalizable? 


be a 5 x 5 matrix with 3 eigenvalues (not counting multiplicities). 
know that one eigenspace is three-dimensional. Can you say if A is 


2.11. Give an example of a 3 x 3 matrix which cannot be diagonalized. After you 


constructed 
matrix could 


he matrix, can you make it “generic”, so no special structure of the 
be seen? 


2.12. Let a non-zero matrix A satisfy A> = 0. Prove that A cannot be diagonalized. 


More genera 


ly, any non-zero nilpotent matrix, i.e. a non-zero matrix satisfying 


AN =0 for some N cannot be diagonalized. 


2.13. Eigenvalues of a transposition: 
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a) Consider the transformation T in the space M2x.2 of 2x 2 matrices, T(A) = 
A’. Find all its eigenvalues and eigenvectors. Is it possible to diagonalize 
this transformation? Hint: While it is possible to write a matrix of this 
linear transformation in some basis, compute characteristic polynomial, 
and so on, it is easier to find eigenvalues and eigenvectors directly from the 
definition. 


b) Can you do the same problem but in the space of n x n matrices? 


2.14. Prove that two subspaces V; and V2 are linearly independent if and only if 
Vin V2 = {0}. 


Chapter 5 


Inner product spaces 


Theory of inner product spaces is developed only for real and complex spaces, 
so F in this Chapter is always R or C; the results usually do not generalize 
to spaces over arbitrary fields. 

Most of the results and calculations in this chapter hold (and the results 
have the same statements) in both real and complex cases. In rare situations 
when there is a difference between real and complex case, we state explicitly 
which case is considered: otherwise everything holds for both cases. 

Finally, when the results and calculations hold for both complex and 
real cases, we use formulas for the complex case; in the real case they give 
correct, although sometimes a bit more complicated, formulas. 


1. Inner product in R” and C”. Inner product spaces. 


1.1. Inner product and norm in R”. In dimensions 2 and 3, we defined 
the length of a vector x (i.e. the distance from its endpoint to the origin) by 
the Pythagorean rule, for example in R® the length of the vector is defined 


as 
loc] = 2} + 03 +23. 


It is natural to generalize this formula for all n, to define the norm of the 
vector x € R” as 


\|x|| = \/a? +a3 +... 402. 


The word norm is used as a fancy replacement for the word length. 
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While the notation 
x-y and term “dot 
product” is often 
used for the inner 
product, for reasons 
which will be clear 
later, we prefer the 
notation (x, y) 
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The dot product in R? was defined as x- y = x1y1 + Loy2 + ©3y3, where 
)* and y = (y1,y2,y3)"- 

Similarly, in R” one can define the inner product (x,y) of two vectors 
X= (a1; 8a; +25 a) Y= ya, -< se) by 


x= (11, 2, £3 


(x,y) = 21y1 + tayo +... + 2nYn =" x, 


so ||x\| = Gx, x). 


Note, that y?x = xy, and we use the notation y’x only to be consis- 
tent. 


1.2. Inner product and norm in C”. Let us now define norm and inner 
product for C”. As we have seen before, the complex space C” is the most 
natural space from the point of view of spectral theory: even if one starts 
from a matrix with real coefficients (or operator on a real vectors space), 
the eigenvalues can be complex, and one needs to work in a complex space. 


For a complex number z = x+ iy, we have |z|? = 27+ y? = 27. IfzeC" 
is given by 


21 ry + iy 
22 x2 + tye 
Z= = 3 ’ 


it is natural to define its norm ||z|| by 


n n 


lIzl|? = Dove + 9k) = DO leal?. 


k=1 k=1 
Let us try to define an inner product on C” such that ||z||? = (z,z). One of 
the choices is to define (z, w) by 


n 
(Zz, w) = ZW, + ZWat... + ZnWn = ) ZkWk; 
k=1 


and that will be our definition of the standard inner product in C”. 


To simplify the notation, let us introduce a new notion. For a matrix 
A let us define its Hermitian adjoint, or simply adjoint A* by A* = A, 
meaning that we take the transpose of the matrix, and then take the complex 


conjugate of each entry. Note, that for a real matrix A, A* = A’. 


Using the notion of A*, one can write the standard inner product in C” 
as 


(z,w) = wz. 
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Remark. It is easy to see that one can define a different inner product in C” such 
that ||z||? = (z,z), namely the inner product given by 

(Z,W)) = Zw, + ZqW2 +... + Zn Wn = Zw. 
We did not specify what properties we want the inner product to satisfy, but z*w 
and w*z are the only reasonable choices giving ||z||? = (z,z). 


Note, that the above two choices of the inner product are essentially equivalent: 
the only difference between them is notational, because (z,w), = (w,Z). 


While the second choice of the inner product looks more natural, the first one, 
(z,w) = w*z is more widely used, so we will use it as well. 


1.3. Inner product spaces. The inner product we defined for R” and C” 
satisfies the following properties: 


1. (Conjugate) symmetry: (x,y) = (y,x); note, that for a real space, 
this property is just symmetry, (x,y) = (y, x); 
2. Linearity: (ax + By,z) = a(x,z) 4 

all scalars a, 3; 


B(y,z) for all vector x,y,z and 


3. Non-negativity: (x,x) > 0 Vx; 
4. Non-degeneracy: (x, x) = 0 if and only ifx = 0. 
Let V be a (complex or real) vector space. An inner product on V is a 


function, that assign to each pair of vectors x, y a scalar, denoted by (x, y) 
such that the above properties 1—4 are satisfied. 


Note that for a real space V we assume that (x,y) is always real, and 
for a complex space the inner product (x, y) can be complex. 


A space V together with an inner product on it is called an inner product 
space. Given an inner product space, one defines the norm on it by 


IIxl| = V(x, x). 
1.3.1. Examples. 


Example 1.1. Let V be R” or C”. We already have an inner product 
(x,y) =y*x = \op_1 Ley, defined above. 


This inner product is called the standard inner product in R” or C” 


We will use symbol F to denote both C and R. When we have some 
statement about the space F”, it means the statement is true for both R” 
and C”. 


Example 1.2. Let V be the space P,, of polynomials of degree at most n. 
Define the inner product by 


1 
(f.9) = [soma 
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It is easy to check, that the above properties 1—4 are satisfied. 


This definition works both for complex and real cases. In the real case 
we only allow polynomials with real coefficients, and we do not need the 
complex conjugate here. 


Let us recall, that for a square matrix A, its trace is defined as the sum 


of the diagonal entries, 
n 


trace A := s Gk,k- 
k=1 


Example 1.3. For the space Mmxn of m x n matrices let us define the 
so-called Frobenius inner product by 


(A, B) = trace(B* A). 
Again, it is easy to check that the properties 1—4 are satisfied, i.e. that we 
indeed defined an inner product. 
Note, that 
trace(B* A) = > Aj ¢Bi kes 
jk 


so this inner product coincides with the standard inner product in C™. 


1.4. Properties of inner product. The statements we get in this section 
are true for any abstract inner product space, not only for F”. To prove them 
we use only properties 1—4 of the inner product. 


First of all let us notice, that properties 1 and 2 imply that 
2!. (x, ay + 82) = a(x, y) + B(x, 2). 
Indeed, 


(x, ay + Bz) = (ay + 8z,x) = aly,x) + B(z,x) = 
= a(y,x) +B (a,x) = a(x, y) + B(x,z) 
Note also that property 2 implies that for all vectors x 
(0,x) = (x,0) = 0. 
Lemma 1.4. Let x be a vector in an inner product space V. Then x = 0 if 
and only if 
(1.1) (x,y) =0 Vy eV. 


Proof. Since (0,y) = 0 we only need to show that (1.1) implies x = 0. 
Putting y = x in (1.1) we get (x,x) =0,sox=0. 


Applying the above lemma to the difference x — y we get the following 
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Corollary 1.5. Let x,y be vectors in an inner product space V. The equality 
x =y holds if and only if 


(x, 2) = (y,z) VaeV. 
The following corollary is very simple, but will be used a lot 
Corollary 1.6. Suppose two operators A,B: X — Y satisfy 
(Ax, y) = (Bx, y) Vx € X, Vy EY. 
Then A= B. 
Proof. By the previous corollary (fix x and take all possible y’s) we get 


Ax = Bx. Since this is true for all x € X, the transformations A and B 
coincide. 


The following property relates the norm and the inner product. 
Theorem 1.7 (Cauchy—Schwarz inequality). 
I(x.) < |hxll - lly. 


Proof. The proof we are going to present, is not the shortest one, but it 
shows where the main ideas came from. 

Let us consider the real case first. If y = 0, the statement is trivial, so 
we can assume that y 4 0. By the properties of an inner product, for all 
scalar t 


0 < ||x — tyl|? = (x — ty, x — ty) = ||x||? — 2t(x,y) + lly’. 


In particular, this inequality should hold for t = ah | and for this point 


the inequality becomes 


(x, y)? | Gey)" -_ i \|? (x, y)? 
lly? ly? llyll? 
which is exactly the inequality we need to prove. 


0 < |[x|/’-2 


There are several possible ways to treat the complex case. One is to 
replace x by ax, where a is a complex constant, |a| = 1 such that (ax, y) 
is real, and then repeat the proof for the real case. 


The other possibility is again to consider 


0 < ||x — ty|? = (x — ty, x — ty) = (x,x — ty) — ty, x — ty) 
= ||x||? — t(y, x) — (x,y) + |e? lly’. 


That is the point where the above quadratic polynomial has a minimum: it can be computed, 
for example by taking the derivative in t and equating it to 0 
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Substituting t = ae = aoe into this inequality, we get 
0< \|x||? _ I(x, y)/? 
2 lly? 


which is the inequality we need. 

Note, that the above paragraph is in fact a complete formal proof of the 
theorem. The reasoning before that was only to explain why do we need to 
pick this particular value of t. 


An immediate Corollary of the Cauchy—Schwarz Inequality is the follow- 
ing lemma. 


Lemma 1.8 (Triangle inequality). For any vectors x, y in an inner product 


space 
IIx + yll < |x|] + llyl- 
Proof. 
IIx + yl? = (&+y,x+y) = [Ix|? + llyll? + &y) + x) 
< |Ix\I? + Ilyll? + 21, y)| 
< |x|? + [ly ll? + 2b] lhyll = (lll + lly ll)?- 


The following polarization identities allow one to reconstruct the inner 
product from the norm: 


Lemma 1.9 (Polarization identities). For x,y € V 


(x,y) =F (Ix + yl? Ile —yIP) 


if V is a real inner product space, and 


1 2 
(y)=7 dD) alx+ayl 


a=+1,+7 


if V is a complex inner product space. 


The lemma is proved by direct computation. We leave the proof as an 
exercise for the reader. 


Another important property of the norm in an inner product space can 
be also checked by direct calculation. 


Lemma 1.10 (Parallelogram Identity). For any vectors u,v 


lu + vl? + Iu — vi? = 2([fall? + [Ivll?). 
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In 2-dimensional space this lemma relates sides of a parallelogram with 
its diagonals, which explains the name. It is a well-known fact from planar 
geometry. 


1.5. Norm. Normed spaces. We have proved before that the norm ||v|| 
satisfies the following properties: 

1. Homogeneity: ||av|| = |a| - ||v|| for all vectors v and all scalars a. 

2. Triangle inequality: ||u+ v|| < ||ul| + ||v]]. 

3. Non-negativity: ||v|| > 0 for all vectors v. 

4. Non-degeneracy: ||v|| = 0 if and only if v = 0. 

Suppose in a vector space V we assigned to each vector v a number ||v]|| 

such that above properties 1-4 are satisfied. Then we say that the function 


v ++ ||v|| is a norm. A vector space V equipped with a norm is called a 
normed space. 

Any inner product space is a normed space, because the norm ||v|| = 
\/(v,v) satisfies the above properties 1-4. However, there are many other 
normed spaces. For example, given p, 1 < p < co one can define the norm 
|| - ||p on R” or C” by 


n 1/p 
[Xllp = (lea? + lool? +... + laralP)/? = (Sse | 
k=1 


One can also define the norm |] - ||oo (p = 00) by 
\|x||o0 = max{|x,|:k = 1,2,...,n}. 
The norm || - ||, for p = 2 coincides with the regular norm obtained from 


the inner product. 


To check that || - ||» is indeed a norm one has to check that it satisfies 
all the above properties 1-4. Properties 1, 3 and 4 are very easy to check, 
we leave it as an exercise for the reader. The triangle inequality (property 
2) is easy to check for p = 1 and p = oo (and we proved it for p = 2). 


For all other p the triangle inequality is true, but the proof is not so 
simple, and we will not present it here. The triangle inequality for || - ||, 
even has special name: its called Minkowski inequality, after the German 
mathematician H. Minkowski. 


Note, that the norm || - ||» for p 4 2 cannot be obtained from an inner 
product. It is easy to see that this norm is not obtained from the standard 
inner product in R” (C"). But we claim more! We claim that it is impossible 
to introduce an inner product which gives rise to the norm || - ||p, p 4 2. 


This statement is actually quite easy to prove. By Lemma 1.10 any norm 
obtained from an inner product must satisfy the Parallelogram Identity. It 
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is easy to see that the Parallelogram Identity fails for the norm || - ||p, p 4 2, 
and one can easily find a counterexample in R?, which then gives rise to a 
counterexample in all other spaces. 

In fact, the Parallelogram Identity, as the theorem below asserts, com- 
pletely characterizes norms obtained from an inner product. 


Theorem 1.11. A norm in a normed space is obtained from some inner 
product if and only if it satisfies the Parallelogram Identity 


lu + vl? + |lu— vi)? = (Jul? + |v?) Yuyv eV. 


Lemma 1.10 asserts that a norm obtained from an inner product satisfies 
the Parallelogram Identity. 


The converse implication is more complicated. If we are given a norm, 
and this norm came from an inner product, then we do not have any choice; 
this inner product must be given by the polarization identities, see Lemma 
1.9. But, we need to show that (x,y) which we got from the polarization 
identities is indeed an inner product, i.e. that it satisfies all the properties. 


It is indeed possible to verify that if the norm satisfies the parallelogram 
identity then the inner product (x,y) obtained from the polarization iden- 
tities is indeed an inner product (i.e. satisfies all the properties of an inner 
product). However, the proof is a bit too involved, so we do not present it 
here. 


Exercises. 


1.1. Compute 
siaiee, 22% Re (F==), (14+23)3,  Im((1+24)3). 


1— 2%’ 1-2 
1.2. For vectors x = (1,2i,1 +i)? and y = (i,2 —i,3)7 compute 
a) (x,y), [lxll?, Ilyll?, Ilys 
b) (3x, 2iy), (2x,ix + 2y); 
c) ||x+2y]]. 


Remark: After you have done part a), you can do parts b) and c) without actually 
computing all vectors involved, just by using the properties of inner product. 


1.3. Let ||ul] = 2, ||v|| = 3, (u,v) = 2+7%. Compute 
Ju + vl, Ju — vII?, (u+v,u— iv), (u + 3iv, 4iu). 
1.4. Prove that for vectors in a inner product space 
IIx + yl? = [lx|? + llyll? + 2Re(x, y) 
Recall that Rez = }(z +2) 


1.5. Explain why each of the following is not an inner product on a given vector 
space: 


2. Orthogonality. Orthogonal and orthonormal bases. 125 


a) (X,¥) = 21y1 — Z2y2 on R’; 
b) (A, B) = trace(A + B) on the space of real 2 x 2 matrices’ 
c) (f,9) = i f'(t)g(#)dt on the space of polynomials; f/(t) denotes derivative. 
1.6 (Equality in Cauchy—Schwarz inequality). Prove that 
I(x, ¥)] = [lll - Ily 


if and only if one of the vectors is a multiple of the other. Hint: Analyze the proof 
of the Cauchy—Schwarz inequality. 


1.7. Prove the parallelogram identity for an inner product space V, 
IIx + yl|? + [lx — yl]? = 2(||>||? + llyll?). 


1.8. Let vi, v2,...,Vn be a spanning set (in particular, a basis) in an inner product 
space V. Prove that 


a) If (x,v) =0 for all v € V, then x = 0; 
b) If (x, v,) = 0 Vk, then x = 0; 
c) If (x, ve) = (y, ve) Vk, then x = y. 


1.9. Consider the space R? with the norm || - ||,, introduced in Section 1.5. For 
p = 1,2,00, draw the “unit ball” B, in the norm || - ||, 


By := {x € R® : ||x||p < 1}. 
Can you guess what the balls B, for other p look like? 


2. Orthogonality. Orthogonal and orthonormal bases. 


Definition 2.1. Two vectors u and v are called orthogonal (also perpen- 
dicular) if (u,v) = 0. We will write u L v to say that the vectors are 
orthogonal. 


Note, that for orthogonal vectors u and v we have the following, so-called 
Pythagorean identity: 
lJu+v|? =u? +iiviP ful. 


The proof is straightforward computation, 


lu + vi? = (u+v,utv) = (u,u) +(v,v) + (u,v) + (v,u) = |lull? + |Iv |? 
((u, v) = (v,u) = 0 because of orthogonality). 
Definition 2.2. We say that a vector v is orthogonal to a subspace EF if v 
is orthogonal to all vectors w in LE. 

We say that subspaces FE and F are orthogonal if all vectors in F are 


orthogonal to F’, i.e. all vectors in EF are orthogonal to all vectors in F 


The following lemma shows how to check that a vector is orthogonal to 
a subspace. 
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Lemma 2.3. Let EF be spanned by vectors vi,V2,...,Vr. Thenv L E if 
and only if 
vi vz, VRE 1,2 ase gf 


Proof. By the definition, if v L E then v is orthogonal to all vectors in E. 
In particular, v L vg, k = 1,2,...,7r. 
On the other hand, let v L vz, k = 1,2,...,r. Since the vectors vz; span 


E, any vector w € E can be represented as a linear combination )7;_) AKVx- 
Then 


(v,w) = ye ax(v, VE) = 9, 
k=1 


sov Lw. 


Definition 2.4. A system of vectors vj, Vv2,.-.,Wn is called orthogonal if 
any two vectors are orthogonal to each other (i.e. if (vj, vz) = 0 for j £ k). 


If, in addition ||v,|| = 1 for all k, we call the system orthonormal. 


Lemma 2.5 (Generalized Pythagorean identity). Let v1,v2,...,Vn be an 
orthogonal system. Then 


n 2 n 
Yo aevel! = So lawl? llvell? 
k=1 k=1 


This formula looks particularly simple for orthonormal systems, where 
Il vel] = 1. 


Proof of the Lemma. 

n 2 n n n n 
So conve = (© aKve, 9055) = SS cnet; (vi Vs). 
k=1 k=1 j=l 


k=1 j=l 
Because of orthogonality (vz, vj) = 0 if 7 # k. Therefore we only need to 
sum the terms with 7 = k, which gives exactly 


nm nm 
J lakl2(ves vk) = S- lal?llvell. 
k=1 k=1 


Corollary 2.6. Any orthogonal system v1, V2,...,Vn of non-zero vectors is 
linearly independent. 


Proof. Suppose for some a1, Q2,...,Qn we have )77_, @kVz = 0. Then by 
the Generalized Pythagorean identity (Lemma 2.5) 


n 
0 = lO? = So lax)? |lvedl?- 
k=1 
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Since ||vx|| 4 0 (vz 4 0) we conclude that 
ap =0 Vk, 


so only the trivial linear combination gives 0. 


Remark. In what follows we will usually mean by an orthogonal system an 
orthogonal system of non-zero vectors. Since the zero vector 0 is orthogonal 
to everything, it always can be added to any orthogonal system, but it is 
really not interesting to consider orthogonal systems with zero vectors. 


2.1. Orthogonal and orthonormal bases. 


Definition 2.7. An orthogonal (orthonormal) system vj, v2,...,Vn which 
is also a basis is called an orthogonal (orthonormal) basis. 


It is clear that in dim V =n then any orthogonal system of n non-zero 
vectors is an orthogonal basis. 


As we studied before, to find coordinates of a vector in a basis one 
needs to solve a linear system. However, for an orthogonal basis finding 
coordinates of a vector is much easier. Namely, suppose v1, V2,..., Vn is an 
orthogonal basis, and let 


n 
X= QV, + QA9VO+...+ AnVyn = ) AjVj. 
j=l 


Taking inner product of both sides of the equation with v1 we get 


n 


(x,vi) = ¥> a3(vj,v1) = an(vi, v1) = aallvall? 
j=l 


(all inner products (v;,v1) = 0 if 7 1), so 


(x, vi) 
Ilva ll? 


ay= 


Similarly, multiplying both sides by vj, we get 


n 


(x, Ve) =. 05(Vj, Vk) = On(Ves Ve) = nll Vell? 


j=l 
so 
(x, VE) 
(2.1) Ak = : 
II vill? 
Therefore, 


to find coordinates of a vector in an orthogonal basis one does not 
need to solve a linear system, the coordinates are determined by 
the formula (2.1). 


This is a very impor- 
tant remark allowing 
one to translate any 
statement about the 
standard inner prod- 
uct space F” to an 
inner product space 
with an orthonormal 
basis V1, V2,---;Vn 
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This formula is especially simple for orthonormal bases, when ||v,|| = 1. 


Namely, if vi, Vv2,...,Vn is an orthonormal basis, any vector v can be 

represented as 
n 
(2.2) v= Sov, Vk)Vk- 
k=1 

This formula is sometimes called (a baby version of) the abstract orthogonal 
Fourier decomposition. The classical (non-abstract) Fourier decomposition 
deals with a concrete orthonormal system (sines and cosines or complex 
exponentials). We call this formula a baby version because the real Fourier 
decomposition deals with infinite orthonormal systems. 


Remark 2.8. The importance of orthonormal bases is that if we fix an 
orthonormal basis in an inner product space V, we can work with coordinates 
in this basis the same way we work with vectors in F”. Namely, as it was 
discussed in the very beginning of the book, see Remark 2.4 in Chapter 
1, if we have a vector space V (over a field F) with a basis v1, v2,...,Vn, 
then we can perform the standard vector operations (vector addition and 
multiplication by a scalar) by working with the columns of coordinates in 
the basis vi, v2,..., vy, in absolutely the same way we work with vectors in 


the standard coordinate space F”. 


Exercise 2.3 below shows that if we have an orthonormal basis in an 
inner product space V, we can compute the inner product of 2 vectors in 
V by taking columns of their coordinates in this orthonormal basis and 
computing the standard inner product (in C” or R”) of these columns. 


As it will be shown below in Section 3 any finite-dimensional inner prod- 
uct space has an orthonormal basis. Thus, the standard inner product spaces 
C” (or R” in the case of real spaces) are essentially the only examples of a 
finite-dimensional inner product spaces. 

Exercises. 

2.1. Find all vectors in R* orthogonal to vectors (1,1,1,1)? and (1, 2,3, 4)”. 
2.2. Let A be a real m x n matrix. Describe (Ran A7)+, (Ran A)+ 

2.3. Let v1, Vv2,...,Vn be an orthonormal basis in V. 


a) Prove that for any x = 77) OkVk. Y = Dop—y Bev 


(x, y) = S- KB: 
k=1 


b) Deduce from this the Parseval’s identity 


n 


(x, y) = Sox, VEe)(y, Vi) 


k=1 
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c) Assume now that vi,v2,...,Vn is only an orthogonal basis, not an or- 
thonormal one. Can you write down Parseval’s identity in this case? 


This problem shows that if we have an orthonormal basis, we can use the 
coordinates in this basis absolutely the same way we use the standard coordinates 
in C” (or R"). 

The problem below shows that we can define an inner product by declaring a 
basis to be an orthonormal one. 


2.4. Let V be a vector space and let vi, v2,...,Vn be a basis in V. For x = 


er OVE, Y = Dopey Bev define (x,y) = pL, an Be. 
Prove that (x,y) defines an inner product in V. 


2.5. Let A be a real m x n matrix. Describe the set of all vectors in F™ orthogonal 
to to Ran A. 


3. Orthogonal projection and Gram-Schmidt 
orthogonalization 


Recalling the definition of orthogonal projection from the classical planar 
(2-dimensional) geometry, one can introduce the following definition. Let E 
be a subspace of an inner product space V. 


Definition 3.1. For a vector v its orthogonal projection Pgv onto the 
subspace EF is a vector w such that 


l.weEk; 
2.v—-wleE. 


We will use notation w = Prev for the orthogonal projection. 


After introducing an object, it is natural to ask: 
1. Does the object exist? 
2. Is the object unique? 
3. How does one find it? 


We will show first that the projection is unique. Then we present a 
method of finding the projection, proving its existence. 


The following theorem shows why the orthogonal projection is important 
and also proves that it is unique. 


Theorem 3.2. The orthogonal projection w = Prev minimizes the distance 
from v to E, i.e. forallx Ee E 


Iv — wl] < [lv — xl]. 
Moreover, if for somex € E 


Iv — wl] = |lv — xl), 
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then x = w. 


Proof. Let y = w — x. Then 


V-xX=V-Wt+Ww-xX=Vv-wWwry. 


Since v — w | E we have y | v— w and so by Pythagorean Theorem 


lv — x]? = Ilv — wl? + ly? = [lv — wll? 


Note that equality happens only if y = 0 ie. ifx = w. 


The following proposition shows how to find an orthogonal projection i 
we know an orthogonal basis in E. 


Proposition 3.3. Let vi, v2,...,Vv, be an orthogonal basis in FE. Then the 
orthogonal projection Prev of a vector v is given by the formula 


(Vv, VE) 


; 
Poiv= So onve where ayn = ——>. 
a a Ivel? 


In other words 


7 (Vv, VE) 


Ilva ll? 


(3.1) Piv= 
k=1 


Vk: 


Note that the formula for a, coincides with (2.1), i.e. this formula applied 
to an orthogonal system (not a basis) gives us a projection onto its span. 


Remark 3.4. It is easy to see now from formula (3.1) that the orthogonal 
projection P,, is a linear transformation. 


One can also see linearity of P,, directly, from the definition and unique- 
ness of the orthogonal projection. Indeed, it is easy to check that for any x 
and y the vector ax + By — (aP,,x — BP,y) is orthogonal to any vector in 
E, so by the definition P,,(ax + By) =aP,x + BPay. 


Remark 3.5. Recalling the definition of inner product in C” and R” one 
can get from the above formula (3.1) the matrix of the orthogonal projection 
P,, onto E in C” (R”) is given by 


“1 
(3.2) P, =>. VeVi 
2+ Wal 
where columns v1, V2,...,V, form an orthogonal basis in E. 


Proof of Proposition 3.3. Let 


+ i 
wes 5 QkVk, where ap = 
k=1 
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We want to show that v— w | FE. By Lemma 2.3 it is sufficient to show 
that v—w L vz, k = 1,2,...,n. Computing the inner product we get for 
| eal np? eee 


(v — w, vz) = (Vv, Ve) — (W, Ve) = (V, VE) — S> aj(v;,Vx) 
j=l 


(Vv, VE) 


2 
= 0. 
[val IIvelh 


= (Vv, vg) — On(VE, Vk) = (V, VE) 


So, if we know an orthogonal basis in E we can find the orthogonal 
projection onto E. In particular, since any system consisting of one vector 
is an orthogonal system, we know how to perform orthogonal projection 
onto one-dimensional spaces. 


But how do we find an orthogonal projection if we are only given a basis 
in E? Fortunately, there exists a simple algorithm allowing one to get an 
orthogonal basis from a basis. 


3.1. Gram-Schmidt orthogonalization algorithm. Suppose we have 


a linearly independent system x1,xX2,...,Xn. The Gram-Schmidt method 
constructs from this system an orthogonal system vj, v2,...,Vn such that 
span{x1, X2,...,Xn} = span{vi, v2,..., Vn}. 


Moreover, for all r < n we get 
span{x, X2Q,+-- Xr} = span{vi, V2,+-- vr} 


Now let us describe the algorithm. 
Step 1. Put vi := x1. Denote by EF := span{x,} = span{vj}. 
Step 2. Define v2 by 


(x2, V1) 
Ilvi |? 


v2 = X2 Pr, x2 = K2 Vi1- 


Define Ez = span{vi, v2}. Note that span{x1, x2} = Fa. 
Step 3. Define v3 by 


V3 (= X3 Pry X3 X3 


Put E3 := span{vi,v2,v3}. Note that span{x1,x2,x3} = E3. Note also 
that x3 ¢ Ey so v3 #0. 
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Step r+ 1. Suppose that we already made r steps of the process, con- 


structing an orthogonal system (consisting of non-zero vectors) V1, V2,...,Vr 
such that E,. := span{vj, v2,...,v,} = span{x,, X2,...,x,}. Define 

 (Xr+1; Vk) 

+1; Vk 
Vrt1 i= Xp. — Pe, Xrp. = Xrgi > Iv P Vk 
k=l k 
Note,that x,41 ¢ E, so v;-41 4 0. 
Continuing this algorithm we get an orthogonal system v1, v2,...,Vn- 


3.2. An example. Suppose we are given vectors 
x; = (1,1,1)", x2 = (0,1,2)7, x3 = (1,0,2)7, 
and we want to orthogonalize it by Gram-Schmidt. On the first step define 
vy, =x, =(1,1,1)". 


On the second step we get 


V2 = X2 — Pr, x2 = X2 ew). 
IIvill 
Computing 
0 1 
(xa,vi)=({ 1 ].[ 2 })=3 Il? =3, 
2 1 
we get 
0 1 -1 
Voo= 1 oes ; 1 => 0 
2 1 1 
Finally, define 
(x3, vi) (x3, V2) 
V3 = X38 Pr. X3 = X38 V2. 
: IIvall? Il val? 
Computing 
1 1 1 —1 
Chr Va ae PSs EOS Tp ON) 1s Irae Sr8e rele 
2 1 2 1 
(||vi||? was already computed before) we get 
1 
1 3 1 1 —1 i 
v3= 0 — 3 1 Ae 0 => —1 
2 1 1 5 
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Remark. Since the multiplication by a scalar does not change the orthog- 
onality, one can multiply vectors vz obtained by Gram-Schmidt by any 
non-zero numbers. 


In particular, in many theoretical constructions one normalizes vectors 
v,, by dividing them by their respective norms ||v ||. Then the resulting 
system will be orthonormal, and the formulas will look simpler. 


On the other hand, when performing the computations one may want 
to avoid fractional entries by multiplying a vector by the least common 
denominator of its entries. Thus one may want to replace the vector v3 
from the above example by (1, —2, 1)”. 


3.3. Orthogonal complement. Decomposition V = E @ E+. 
Definition. For a subspace E its orthogonal complement E+ is the set. of 
all vectors orthogonal to E, 
Et :={x:x 1 E}. 
If x,y L FE then for any linear combination ax + Gy L E (can you see 
why?). Therefore E+ is a subspace. 


By the definition of orthogonal projection any vector in an inner product 
space V admits a unique representation 


v=avit+vo, vi € E, vo 1 E (eqv. v2 € E*) 

(where clearly v; = P,,v). 

This statement is often symbolically written as V = E © E+, which 
mean exactly that any vector admits the unique decomposition above. 

The following proposition gives an important property of the orthogonal 
complement. 
Proposition 3.6. For a subspace E 

(E1)+ =E. 


The proof is left as an exercise, see Exercise 3.12 below. 


Exercises. 


3.1. Apply Gram-Schmidt orthogonalization to the system of vectors (1,2,—2)", 
(, —1, 4)", (2, 1, 1" 


3.2. Apply Gram-Schmidt orthogonalization to the system of vectors (1,2,3)7, 
(1,3,1)". Write the matrix of the orthogonal projection onto 2-dimensional sub- 
space spanned by these vectors. 


3.3. Complete an orthogonal system obtained in the previous problem to an or- 
thogonal basis in R°, i.e. add to the system some vectors (how many?) to get an 
orthogonal basis. 
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Can you describe how to complete an orthogonal system to an orthogonal basis 
in general situation of R” or C”? 


3.4. Find the distance from a vector (2,3,1)" to the subspace spanned by the 
vectors (1,2,3)7, (1,3,1)7. Note, that I am only asking to find the distance to the 
subspace, not the orthogonal projection. 


3.5. Find the orthogonal projection of a vector (1,1,1,1)7 onto the subspace 
spanned by the vectors v; = (1, 3,1, 1)? and v2 = (2,—1,1,0)7 (note that v1 L va). 


3.6. Find the distance from a vector (1,2,3,4) to the subspace spanned by the 
vectors v1 = (1,—1,1,0)7 and v2 = (1,2,1,1)" (note that v, 1 v2). Can you find 
the distance without actually computing the projection? That would simplify the 
calculations. 


3.7. True or false: if E is a subspace of V, then dim E+dim(E£+) = dimV? Justify. 


3.8. Let P be the orthogonal projection onto a subspace E of an inner product space 
V, dimV =n, dim E =r. Find the eigenvalues and the eigenvectors (eigenspaces). 
Find the algebraic and geometric multiplicities of each eigenvalue. 


3.9. (Using eigenvalues to compute determinants). 


a) Find the matrix of the orthogonal projection onto the one-dimensional 


subspace in R” spanned by the vector (1,1,...,1)7; 


<P 


Let A be the n x n matrix with all entries equal 1. Compute its eigenvalues 
and their multiplicities (use the previous problem); 

c) Compute eigenvalues (and multiplicities) of the matrix A— TI, ie. of the 
matrix with zeroes on the main diagonal and ones everywhere else; 


d) Compute det(A — I). 


3.10 (Legendre’s polynomials:). Let an inner product on the space of polynomials 


be defined by (f,g) = a f(t)g(t)dt. Apply Gram-Schmidt orthogonalization to 
the system 1, ¢, t?, t?. 


eS 


Legendre’s polynomials are particular case of the so-called orthogonal polyno- 
mials, which play an important role in many branches of mathematics. 


3.11. Let P = Pg be the matrix of an orthogonal projection onto a subspace E. 
Show that 


a) The matrix P is self-adjoint, meaning that P* = P. 
b) P? =P. 


Remark: The above 2 properties completely characterize orthogonal projection, 
ie. any matrix P satisfying these properties is the matrix of some orthogonal pro- 
jection. We will discuss this some time later. 


3.12. Show that for a subspace E we have (E+)+ = E. Hint: It is easy to see 
that E is orthogonal to E+ (why?). To show that any vector x orthogonal to E+ 
belongs to E use the decomposition V = E @ E+ from Section 3.3 above. 


3.13. Suppose P is the orthogonal projection onto a subspace EF, and Q is the 
orthogonal projection onto the orthogonal complement E+. 
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a) What are P+ Q and PQ? 


b) Show that P — Q is its own inverse. 
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4, Least square solution. Formula for the orthogonal 
projection 


As it was discussed before in Chapter 2, the equation 
Ax=b 


has a solution if and only if b € Ran A. But what do we do to solve an 
equation that does not have a solution? 


This seems to be a silly question, because if there is no solution, then 
there is no solution. But, situations when we want to solve an equation that 
does not have a solution can appear naturally, for example, if we obtained 
the equation from an experiment. If we do not have any errors, the right side 
b belongs to the column space Ran A, and equation is consistent. But, in 
real life it is impossible to avoid errors in measurements, so it is possible that 
an equation that in theory should be consistent, does not have a solution. 
So, what can one do in this situation? 


4.1. Least square solution. The simplest idea is to write down the error 
|| Ax — bl| 


and try to find x minimizing it. If we can find x such that the error is 0, 
the system is consistent and we have exact solution. Otherwise, we get the 
so-called least square solution. 


The term least square arises from the fact that minimizing || Ax — b]] is 
equivalent to minimizing 
m n 


| Ax — BI? = D> [(Ax)e — bel? = S°]S> Ansty — bu 
k=1 


k=1 j=l 


| 2 


i.e. to minimizing the sum of squares of linear functions. 


There are several ways to find the least square solution. If we are in 
R”, and everything is real, we can forget about absolute values. Then we 
can just take partial derivatives with respect to x; and find the where all of 
them are 0, which gives us the minimum. 


4.1.1. Geometric approach. However, there is a simpler way of finding the 
minimum. Namely, if we take all possible vectors x, then Ax gives us all 
possible vectors in Ran A, so minimum of || Ax — bl] is exactly the distance 
from b to Ran A. Therefore the value of || Ax — b|| is minimal if and only if 
Ax = Pranab, where Prana stands for the orthogonal projection onto the 
column space Ran A. 


So, to find the least square solution we simply need to solve the equation 


Ax = PRan Ab. 
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If we know an orthogonal basis vj, v2,...,Vn in Ran A, we can find vector 
Pran Ab by the formula 


PRan Ab = > te Vi) 


[vel 
If we only know a basis in Ran A, we oa to use the Gram—Schmidt orthog- 
onalization to obtain an orthogonal basis from it. 


So, theoretically, the problem is solved, but the solution is not very 
simple: it involves Gram-—Schmidt orthogonalization, which can be compu- 
tationally intensive. Fortunately, there exists a simpler solution. 


4.1.2. Normal equation. Namely, Ax is the orthogonal projection Pranab 
if and only if b— Ax L Ran A (Ax € Ran A for all x). 


If aj, a2,...,A@, are columns of A, then the condition Ax | Ran A can 
be rewritten as 
b — Ax 1 ag, Vk =1,2,...,n. 
That means 
0 = (b — Ax, ax) = aj (b — Ax) VHT 2) neg 
Joining rows aj together we get that these equations are equivalent to 
A*(b — Ax) = 0, 
which in turn is equivalent to the so-called normal equation 
A* Ax = A*b 
A solution of this equation gives us the least square solution of Ax = b. 


Note, that the least square solution is unique if and only if A*A is 
invertible. 


4.2. Formula for the orthogonal projection. As we already discussed 
above, if x is a solution of the normal equation A*Ax = A*b (i.e. a least 
square solution of Ax = b), then Ax = Pranab. So, to find the orthogonal 
projection of b onto the column space Ran A we need to solve the normal 
equation A* Ax = A*b, and then multiply the solution by A. 


If the operator A*A is invertible, the solution of the normal equation 
A* Ax = A*b is given by x = (A*A)~!A*b, so the orthogonal projection 
Pran Ab can be computed as 


Pranab = A(A* A)! A*b 
Since this is true for all b, 
Pran A = A(A*A)~1A* 


is the formula for the matrix of the orthogonal projection onto Ran A. 
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The following theorem implies that for an m x n matrix A the matrix 
A*A is invertible if and only if rank A = n. 


Theorem 4.1. For an m x n matrix A 


Ker A = Ker(A*A). 


Indeed, according to the rank theorem Ker A = {0} if and only if rank 
A is n. Therefore Ker(A*A) = {0} if and only if rank A = n. Since the 
matrix A*A is square, it is invertible if and only if rank A = n. 

We leave the proof of the theorem as an exercise. To prove the equality 
Ker A = Ker(A*A) one needs to prove two inclusions Ker(A*A) C Ker A 
and Ker A C Ker(A*A). One of the inclusions is trivial, for the other one 
use the fact that 


|| Ax||? = (Ax, Ax) = (A* Ax, x). 


4.3. An example: line fitting. Let us introduce a few examples where 
the least square solution appears naturally. Suppose that we know that two 
quantities x and y are related by the law y = a+ bx. The coefficients a and 
b are unknown, and we would like to find them from experimental data. 

Suppose we run the experiment n times, and we get n pairs (xx, yx), 
k = 1,2,...,n. Ideally, all the points (xz, y,) should be on a straight line, 
but because of errors in measurements, it usually does not happen: the point 
are usually close to some line, but not exactly on it. That is where the least 
square solution helps! 


Ideally, the coefficients a and 6 should satisfy the equations 
at brp = yr, eS 2 


(note that here, x, and y, are some fixed numbers, and the unknowns are 
a and 6). If it is possible to find such a and b we are lucky. If not, the 
standard thing to do is to minimize the total quadratic error 


n 
Sola + bre — yxl?. 
k=1 


But, minimizing this error is exactly finding the least square solution of the 


system 
1 2 Yi 
1 BoD) a = Y2 
. . b 7 . 
1 ay Yn 


(recall that x, yx are some given numbers, and the unknowns are a and 0). 
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4.3.1. An example. Suppose our data (x, yz) consist of pairs 
(—2, 4), (-1, 2), (0, 1), (2, 1), (3, 1). 
Then we need to find the least square solution of 


1 -2 4 
—1 


eRe ee 
NO 
| ama | 
ce 
a: | 
ll 
Pree Ww 


Then 


| 
i) 
| 
e 
or 
NR 
wore 
NY 
Bee ee 
ee) 
lI 
Meas 
No 
aN 
NY 


and 


PreDe 
ll 
oS 
| 
on 
ad 


Sede: 1 1 14141 
gD ( —2 -1 0 2 3 
1 
so the normal equation A* Ax = A*b is rewritten as 


(2 ie )(3)=() 
2 18 b —5 }- 
The solution of this equation is 
a=2,b=-1/2, 
so the best fitting straight line is 
y=2-1/22. 


4.4. Other examples: curves and planes. The least square method is 
not limited to the line fitting. It can also be applied to more general curves, 
as well as to surfaces in higher dimensions. 

The only constraint here is that the parameters we want to find be 
involved linearly. The general algorithm is as follows: 


1. Find the equations that your data should satisfy if there is exact fit; 


2. Write these equations as a linear system, where unknowns are the 
parameters you want to find. Note, that the system need not to be 
consistent (and usually is not); 


3. Find the least square solution of the system. 
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4.4.1. An example: curve fitting. For example, suppose we know that the 
relation between x and y is given by the quadratic law y = a+ br + cx”, so 
we want to fit a parabola y = a+ bx + cx? to the data. Then our unknowns 
a, b, c should satisfy the equations 

at bap + carr = yp, al lee? ne 


or, in matrix form 


L cay 7 YI 

1 2 2 4 yo 
a ee pe 

1 wp re Yn 


For example, for the data from the previous example we need to find the 
least square solution of 


1 -2 4 4 
1 -1 1 a 2 
1 0 0 b}=] 1 
1 2 4 1 
1 3 9 1 
Then 
1 -2 4 
1 fy. OLS ee 1 -1 1 5 2 18 
A*A= —2 -1 0 2 3 1 0 O f= 2 18 26 
4 1 049 1 2 4 18 26 114 
1 3 9 
and 
4 
1 1 1411 2 9 
A*b= —2 -1 0 2 3 1 = —5 
4 1 04 9 1 31 
1 
Therefore the normal equation A*Ax = A*b is 
5 2 18 9 
2 18 26 —5 
18 26 114 31 


which has the unique solution 
a = 86/77, b = —62/77, c = 43/154. 


Therefore, 


y = 86/77 — 62a /77 + 43a7/154 
is the best fitting parabola. 
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4.4.2. Plane fitting. As another example, let us fit a plane z = a+ bx + cy 
to the data 


(Lk; Yk Zk) ER’, k=1,2,...n. 
The equations we should have in the case of exact fit are 
at bry, + cyp = Zk; hia, Qo he 


or, in the matrix form 


lam 21 

1 wo ye a z2 
b |= 

ds) cote Yn zn, 


So, to find the best fitting plane, we need to find the best square solution of 
this system (the unknowns are a, b, c). 


Exercises. 


4.1. Find the least square solution of the system 


1 0 1 
0 1 fx= 1 
iL, 0 


4.2. Find the matrix of the orthogonal projection P onto the column space of 


1 1 
2 1 
—2 4 


Use two methods: Gram—Schmidt orthogonalization and formula for the projection. 
Compare the results. 
4.3. Find the best straight line fit (least square solution) to the points (—2, 4), 
(—1,3), (0,1), (2,0). 
4.4, Fit a plane z = a+ ba + cy to four points (1,1,3), (0,3,6), (2,1,5), (0,0,0). 
To do that 
a) Find 4 equations with 3 unknowns a, b,c such that the plane pass through 
all 4 points (this system does not have to have a solution); 
b) Find the least square solution of the system. 


4.5. Minimal norm solution. let an equation Ax = b has a solution, and let A has 
non-trivial kernel (so the solution is not unique). Prove that 


a) There exist a unique solution xo of Ax = b minimizing the norm ||x||, 
i.e. that there exists unique xo such that Axo = b and ||xo|| < ||x|| for any 
x satisfying Ax = b. 


b) xo = x for any x satisfying Ax = b. 


P tes A)t 
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4.6. Minimal norm least square solution. Applying previous problem to the equa- 
tion Ax = P,,,, ,b show that 


a) There exists a unique least square solution x9 of Ax = b minimizing the 
norm ||x||. 


b) xo = x for any least square solution x of Ax = b. 


P ei 
(Ker A) 
5. Adjoint of a linear transformation. Fundamental 

subspaces revisited. 


5.1. Adjoint matrices and adjoint operators. Let as recall that for an 
mx n matrix A its Hermitian adjoint (or simply adjoint) A* is defined by 
A* := AT. In other words, the matrix A* is obtained from the transposed 
matrix A? by taking complex conjugate of each entry. 


The following identity is the main property of adjoint matrix: 


(Ax, y) = (x, A*y) Vx €C", Vy €C™. 


Before proving this identity, let us introduce some useful formulas. Let us 
recall that for transposed matrices we have the identity (AB)? = BT AT. 
Since for complex numbers z and w we have ZW = ZW, the identity 


(AB)* = B*A* 
holds for the adjoint. 
Also, since (AT)? = A and Z = z, 
(A*)* = A. 
Now, we are ready to prove the main identity: 
(Ax, y) = y" Ax = (A*y)"x = (x, A*y); 


the first and the last equalities here follow from the definition of inner prod- 
uct in F”, and the middle one follows from the fact that 


(A*x)* = x*(A*)* = x*A. 
5.1.1. Uniqueness of the adjoint. The above main identity (Ax,y) 


= (x, A*y) is often used as the definition of the adjoint operator. Let us 
first notice that the adjoint operator is unique: if a matrix B satisfies 


(Ax, y) = (x, By) VX, y, 
then B = A*. Indeed, by the definition of A* for a given y we have 
(x, A*y) = (x, By) Vx, 


and therefore by Corollary 1.5 A*y = By. Since it is true for all y, the 
linear transformations, and therefore the matrices A* and B coincide. 
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5.1.2. Adjoint transformation in abstract setting. The above main identity 
(Ax, y) = (x, A*y) can be used to define the adjoint operator in abstract 
setting, where A : V — W is an operator acting from one inner product 
space to another. Namely, we define A* : W — V to be the operator 
satisfying 

(Ax, y) = (x, A*y) Vx EV, Vy € W. 
Why does such an operator exists? We can simply construct it: consider 
orthonormal bases A = v1, V2,...,Vn in V and B = wi, wo2,...,Wm in W. 
If [A] za 38 the matrix of A with respect to these bases, we define the operator 
A* by defining its matrix [A*] ,, as 

[AN ae => ([Alg4)"- 

We leave the proof that this indeed gives the adjoint operator as an exercise 
for the reader. 


Note, that the reasoning in the above Sect. 5.1.1 implies that the adjoint 
operator is unique. 
5.1.3. Useful formulas. Below we present the properties of the adjoint op- 
erators (matrices) we will use a lot. We leave the proofs as an exercise for 
the reader. 


1. (A+ B)* = A* +B; 


2. (aA)* = a@A’*; 

3. (AB)* = BY At; 

4, (A*)* = A; 

5. (y, Ax) = (A*y,x). 


5.2. Relation between fundamental subspaces. 


Theorem 5.1. Let A: V + W be an operator acting from one inner product 

space to another. Then 
1. Ker A* = (Ran A)- 
2. Ker A = (Ran A*)-; 
3. Ran A = (Ker A*)- 
4. Ran A* = (Ker A)~. 


Remark. Earlier in Section 7 of Chapter 2 the fundamental subspaces were 
defined (as it is often done in the literature) using A? instead of A*. Of 
course, there is no difference for real matrices, so in the real case the above 
theorem gives the geometric description of the fundamentals subspaces de- 
fined there. 


Geometric interpretation of the fundamental subspaces defined using AT 
is presented in Chapter 8 below, see Section 3 there (Theorem 3.7). The 
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formulas in this theorem are essentially the same as in Theorem 5.1 here, 
only the interpretation is a bit different. 


Proof of Theorem 5.1. First of all, let us notice, that since for a subspace 
E we have (E+)+ = E, the statements 1 and 3 are equivalent. Similarly, 
for the same reason, the statements 2 and 4 are equivalent as well. Finally, 
statement 2 is exactly statement 1 applied to the operator A* (here we use 
the fact that (A*)* = A). 

So, to prove the theorem we only need to prove statement 1. 

We will present 2 proofs of this statement: a “matrix” proof, and an 
“invariant”, or “coordinate-free” one. 

In the “matrix” proof, we assume that A is an m x n matrix, i.e. that 
A: F" + F™. The general case can be always reduced to this one by 
picking orthonormal bases in V and W, and considering the matrix of A in 
this bases. 

Let aj,a2,...,a, be the columns of A. Note, that x € (Ran A)+ if and 
only if x L ax (ie. (x, ay) = 0) Vk =1,2,...,n. 


By the definition of the inner product in F”, that means 
0 = (x,a,) =azx Vk =1,2,...,n. 


Since aj, is the row number k of A*, the above n equalities are equivalent to 
the equation 


A*x = 0. 
So, we proved that x € (Ran A)+ if and only if A*x = 0, and that is exactly 
the statement 1. 


Now, let us present the “coordinate-free” proof. The inclusion x € 
(Ran A)+ means that x is orthogonal to all vectors of the form Ay, i.e. that 


(x, Ay) = 0 Vy. 
Since (x, Ay) = (A*x, y), this identity is equivalent to 
(A*x,y)=0 Vy, 


and by Lemma 1.4 this happens if and only if A*x = 0. So we proved that 
x € (Ran A)+ if and only if A*x = 0, which is exactly the statement 1 of 
the theorem. 


5.3. The “essential” part of a linear transformation. The above the- 
orem makes the structure of the operator A and the geometry of fundamental 
subspaces much more transparent. It follows from this theorem that the op- 
erator A can be represented as a composition of orthogonal projection onto 
Ran A* and an isomorphism from Ran A* to Ran A. 
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Indeed, let A: Ran A* + Ran A be the restriction of A to the domain 
Ran A* and the target space Ran A, 
Ax = Ax, Yx € Ran A*. 
Since Ker A = (Ran A*)+, we have 


Ax = AP aan ae = APaan aX 


the fact that x—- P, a ye € (Ran A*)+ = Ker A is used here. Therefore we 
can write 


(5.1) Ax = AP x Vx € X, 


Vx € X; 


or, equivalently, A = APS. Ae 


Note also that A : Ran A* — Ran A is an invertible transformation. 
First we notice that Ker A = {0}: if x € Ran A* is such that Ax = Ax = 0, 
then x € Ker A = (Ran A*)+, so x € Ran A* (Ran A*)+, thus x = 0. Then 
to see that A is invertible, it is sufficient to she that A is onto (surjective). 
But this immediately follows from (5.1): 

Ran A = ARan A* = AP, ,.X =AX =RanA. 
van A 

The isomorphism A is sometimes called the “essential part” of the op- 

erator A (a non-standard terminology). 


The fact the “essential part” A : Ran A* — Ran A of A is an isomor- 
phism implies the following “complex” rank theorem: rank A = rank A*. 
But, of course, this theorem also follows from an elementary observation that 
complex conjugation does not change rank of a matrix, rank A = rank A. 


Exercises. 


5.1. Show that for a square matrix A the equality det(A*) = det(A) holds. 


5.2. Find matrices of orthogonal projections onto all 4 fundamental subspaces of 
the matrix 
ee 
3 2 
2 4 3 
Note, that really you need only to compute 2 of the projections. If you pick an 


appropriate 2, the other 2 are easy to obtain from them (recall, how the projections 
onto E and E+ are related). 


A= 


5.3. Let A be an m x n matrix. Show that Ker A = Ker(A* 4A). 


To do that you need to prove 2 inclusions, Ker(A*A) C Ker A and Ker A C 
Ker(A*A). One of the inclusions is trivial, for the other one use the fact that 


|| Ax||? = (Ax, Ax) = (A*Ax, x). 
5.4. Use the equality Ker A = Ker(A* A) to prove that 
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a) rank A = rank(A* A); 


b) If Ax = 0 has only the trivial solution, A is left invertible. (You can just 
write a formula for a left inverse). 


5.5. Suppose, that for a matrix A the matrix A*A is invertible, so the orthogonal 
projection onto Ran A is given by the formula A(A*A)~!A*. Can you write formulas 
for the orthogonal projections onto the other 3 fundamental subspaces (Ker A, 
Ker A*, Ran A*)? 


5.6. Let a matrix P be self-adjoint (P* = P) and let P? = P. Show that P is the 
matrix of an orthogonal projection. Hint: consider the decomposition x = x; +x, 
x, € RanP, xg | RanP and show that Px; = x,, Pxg = 0. For one of the 
equalities you will need self-adjointness, for the other one the property P? = P. 


6. Isometries and unitary operators. Unitary and orthogonal 
matrices. 


6.1. Main definitions. 


Definition. An operator U : X — Y is called an isometry, if it preserves 
the norm, 


|Ux|| = Ixl|_ Vx eX. 


The following theorem shows that an isometry preserves the inner prod- 
uct 


Theorem 6.1. An operator U : X > Y is an isometry if and only if it 
preserves the inner product, i.e if and only if 


(x,y) =(Ux,Uy)  Vx,y EX. 


Proof. The proof uses the polarization identities (Lemma 1.9). For exam- 
ple, if X is a complex space 


(Ux,Uy)=> S$) allUx+avy|l? 


cr. 


2 
| 

I 
he 
i 


a||U(x + ay)||? 


ca 


do allx + ay|? = (x,y). 


cr. 


io) 
Il 
u 
H 
he 
Ly 


Q 
| 

I 
he 
i 


ll 
Ble Bl Ble 
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Similarly, for a real space X 


1 
(Ux, Uy) z (lUx + yl? ||Ux — Uyl||’) 
1 


(IlU(x + y)IP? — 10% - y)I) 


4 
¥ (\be-+ yl? ~ [bx — yl?) = Gey). 


Lemma 6.2. An operator U : X > Y is an isometry if and only ifU*U = I. 


Proof. If U*U = I, then by the definition of adjoint operator 
(x, x) = (U*Ux, x) = (Ux, Ux) Vx € X. 
Therefore ||x|| = ||Ux||, and so U is an isometry. 


On the other hand, if U is an isometry, then by the definition of adjoint 
operator and by Theorem 6.1 we have for all x € X 


(U*Ux,y) = (Ux,Uy)=(x%y) VWyeEX, 


and therefore by Corollary 1.5 U*Ux = x. Since it is true for all x € X, we 
have U*U = I. 


The above lemma implies that an isometry is always left invertible (U* 
being a left inverse). 


Definition. An isometry U : X — Y is called a unitary operator if it is 
invertible. 


Proposition 6.3. An isometry U : X — Y is a unitary operator if and 
only if dim X = dimY. 


Proof. Since U is an isometry, it is left invertible, and since dim X = dim Y, 
it is invertible (a left invertible square matrix is invertible). 

On the other hand, if U : X — Y is invertible, dim X = dimY (only 
square matrices are invertible, isomorphic spaces have equal dimensions). 


A square matrix U is called unitary if U*U = I, i.e. a unitary matrix is 
a matrix of a unitary operator acting in F”. 


A unitary matrix with real entries is called an orthogonal matrix. An 
orthogonal matrix can be interpreted a matrix of a unitary operator acting 
in the real space R”. 


Few properties of unitary operators: 


1. For a unitary transformation U, U~! = U*; 
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2. If U is unitary, U* = U7! is also unitary; 


3. If U is a isometry, and v1, v2,...,Vn is an orthonormal basis, then 
Uv,,Uvo,...,Uv, is an orthonormal system. Moreover, if U is 
unitary, Uv,, Uv2,...,U Vp is an orthonormal basis. 


4. A product of unitary operators is a unitary operator as well. 


6.2. Examples. First of all, let us notice, that 


a matrix U is an isometry if and only if its columns form an or- 


thonormal system. 


This statement can be checked directly by computing the product U*U. 
It is easy to check that the columns of the rotation matrix 
cosa —sina 
sina cosa 
are orthogonal to each other, and that each column has norm 1. Therefore, 
the rotation matrix is an isometry, and since it is square, it is unitary. Since 
all entries of the rotation matrix are real, it is an orthogonal matrix. 

The next example is more abstract. Let X and Y be inner product 
spaces, dim X = dimY = n, and let x1,x2,...,Xn and yj, y2,...,¥n be 
orthonormal bases in X and Y respectively. Define an operator U: X — Y 
by 

Uxk = Yr, k= 1,2)...3m 


Since for a vector x = cyX, + co2xX2 +...+CnXn 
lI? = lea)? + Leal? +... + Leal? 
and 
n n n 
|x|? = | CO Deaxnd|l? = Do cayell? = De eal? 
k=1 k=1 k=1 
one can conclude that ||Ux|| = ||x|| for all x € X, so U is a unitary operator. 


6.3. Properties of unitary operators. 


Proposition 6.4. Let U be a unitary matrix. Then 


1. |detU| =1. In particular, for an orthogonal matrix det U = +1; 
2. If X is an eigenvalue of U, then |A| = 1 
Remark. Note, that for an orthogonal matrix, an eigenvalue (unlike the 


determinant) does not have to be real. Our old friend, the rotation matrix 
gives an example. 
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Proof of Proposition 6.4. Let det U = z. Since det(U*) = det(U), see 
Problem 5.1, we have 


|z|? = Zz = det(U*U) = det I = 1, 


so | det U| = |z| = 1. Statement 1 is proved. 


To prove statement 2 let us notice that if Ux = Ax then 


|| Ux|| = [Axl] = Al - IPI, 


so |A| = 1. 


6.4. Unitary equivalent operators. 


Definition. Operators (matrices) A and B are called unitarily equivalent if 
there exists a unitary operator U such that A = UBU*. 


Since for a unitary U we have U~! = U*, any two unitary equivalent 
matrices are similar as well. 


The converse is not true, it is easy to construct a pair of similar matrices, 
which are not unitarily equivalent. 


The following proposition gives a way to construct a counterexample. 


Proposition 6.5. A matrix A is unitarily equivalent to a diagonal one if 
and only if it has an orthogonal (orthonormal) basis of eigenvectors. 


Proof. Let A = UBU* and let Bx = \x. Then AUx = UBU*Ux = 
UBx = U(Ax) = Ux, i.e. Ux is an eigenvector of A. 


So, let A be unitarily equivalent to a diagonal matrix D, i.e. let A = 
UDU*. The vectors e, of the standard basis are eigenvectors of D, so 
the vectors Ue, are eigenvectors of A. Since U is unitary, the system 
Ue,, Veg,..., Ven is an orthonormal basis. 


Now let A has an orthogonal basis uj, u2,...,Un of eigenvectors. Divid- 
ing each vector ux, by its norm if necessary, we can always assume that the 
system Uj, Ug,...,U, is an orthonormal basis. Let D be the matrix of A in 
the basis B = uj, U2,...,Un. Clearly, D is a diagonal matrix. 

Denote by U the matrix with columns uj, u2g,...,Un. Since the columns 
form an orthonormal basis, U is unitary. The standard change of coordinate 
formula implies 

A=[A] 


ss = UselAlesles = UDU~ 


and since U is unitary, A = UDU*. 
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Exercises. 


6.1. Orthogonally diagonalize the following matrices, 


dla 2 0 -1 eee 
2 1)’ 1 0)? eee 
2.-22° +0 
i.e. for each matrix A find a unitary matrix U and a diagonal matrix D such that 
A=UDU* 


6.2. True or false: a matrix is unitarily equivalent to a diagonal one if and only if 
it has an orthogonal basis of eigenvectors. 


6.3. Prove the polarization identities 

(Ax,y) = G[(AGe+y), x+y) — (Ay), x—y)] (real ease, A= A*), 
and 

(Ax, y) = é S- a(A(x + ay),x + ay) (complex case, A is arbitrary). 


4 . 
aSt1, 4. 


6.4. Show that a product of unitary (orthogonal) matrices is unitary (orthogonal) 
as well. 


6.5. Let U : X — X be a linear transformation on a finite-dimensional inner 
product space. True or false: 


a) If ||Ux|| = ||x|| for all x € X, then U is unitary. 


b) If ||Vex|| = |leg||, & = 1,2...,n for some orthonormal basis e;, e2,..., en, 
then U is unitary. 


Justify your answers with a proof or a counterexample. 
6.6. Let A and B be unitarily equivalent n x n matrices. 


a) Prove that trace(A*A) = trace(B* B). 
b) Use a) to prove that 


> Aj? = 5 [Bie 


j,k=1 jyk=1 


2 


c) Use b) to prove that the matrices 


1 2 i 4 
(oa), =e la) 
are not unitarily equivalent. 


6.7. Which of the following pairs of matrices are unitarily equivalent: 
1 0 0 1 
a) ( 01 ) and ( 10 : 
0 1 0 1/2 
0 a and Cre } 
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01 0 2 00 
c) -1 0 0 and 0 -1 0 
001 0 00 
01 0 1 00 
d) -1 0 0 and 0 -i 0 
001 0 0 % 
1 1 0 1 0 0 
e) 0 2 2 and 0 2 0 
0 0 38 0 0 38 


Hint: It is easy to eliminate matrices that are not unitarily equivalent: remember, 
that unitarily equivalent matrices are similar, and trace, determinant and ecigenval- 
ues of similar matrices coincide. 


Also, the previous problem helps in eliminating non unitarily equivalent matri- 
ces. 


Finally, a matrix is unitarily equivalent to a diagonal one if and only if it has 
an orthogonal basis of eigenvectors. 


6.8. Let U be a 2 x 2 orthogonal matrix with det U = 1. Prove that U is a rotation 
matrix. 


6.9. Let U be a3 x 3 orthogonal matrix with det U = 1. Prove that 


a) 1 is an eigenvalue of U. 


b) If v1, vo, v3 is an orthonormal basis, such that Uv, = v, (remember, that 
1 is an eigenvalue), then in this basis the matrix of U is 


1 0 0 
0 cosa —sina }, 
0 sina cosa 


where a is some angle. 

Hint: Show, that since v; is an eigenvector of U, all entries below 1 
must be zero, and since vj is also an eigenvector of U* (why?), all entries 
right of 1 also must be zero. Then show that the lower right 2 x 2 matrix 
is an orthogonal one with determinant 1, and use the previous problem. 
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A rigid motion in an inner product space V is a transformation f :V —> V 
preserving the distance between point, i.e. such that 


If(x)-f@)l=lk-yl YxyeV. 


Note, that in the definition we do not assume that the transformation f is 
linear. 

Clearly, any unitary transformation is a rigid motion. Another example 
of a rigid motion is a translation (shift) by ac V, f(x) =x+a. 
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The main result of this section is the following theorem, stating that any 
rigid motion in a real inner product space is a composition of an orthogonal 
transformation and a translation. 


Theorem 7.1. Let f be a rigid motion in a real inner product space X, and 
let T(x) := f(x) — f(0). Then T is an orthogonal transformation. 


To prove this theorem we need the following simple lemma. 
Lemma 7.2. Let T be as defined in Theorem 7.1. Then for all x,y © X 
1. ||Tx|] = |xI/; 
2. ||Z(x) — Ty) = lx — yl; 
3. (T(x), T(y)) = (x,y). 


Proof. To prove statement 1 notice that 


IITx)I| = Ife) — FYI] = Ix — Ol] = [lI]. 


Statement 2 follows from the following chain of identities: 


IIT x) — T(y)Il = IFC) — FO) — (FY) — FO))I 
= If) — FOI = lk — yl. 


An alternative explanation would that T is a composition of 2 rigid 
motions (f followed by the translation by a = —f(0)), and one can easily 
see that a composition of rigid motions is a rigid motion. Since T(0) = 0, 
and so ||T(x)|| = ||T(x) — 7'(O)||, statement 1 can be treated as a particular 
case of statement 2. 


To prove statement 3, let us notice that in a real inner product space 
IZ) — TY)? = ITO? + ITIP — AL), TY), 


and 


llx — yll? = [ll]? + Ilyll? — 20, y). 
Recalling that |[7'(x) — T(y)|| = |lx — y|| and ||7(x)|] = Ilxll, IZ) Il = lly 
we immediately get the desired conclusion. 


Proof of Theorem 7.1. First of all notice that for all x © X 
Tx) || = lf) — F(O)|] = |x — Ol] = [IxI], 


so T preserves the norm, ||Tx|| = ||x||. 


We would like to say that the identyty ||7'x|| = ||x|| means T is an 
isometry, but to be able to say that we need to prove that T is a linear 
transformation. 


To do that, let us fix an orthonormal basis e1,e2,...,@n in X, and let 
b, := T(ex,), & = 1,2,...,n. Since T preserves the inner product (statement 
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3 of Lemma 7.2), we can conclude that bi, b2,...,b, is an orthonormal 
system. In fact, since dim X = n (because basis e€1,€2,...,@n consists of n 
vectors), we can conclude that bj, b2,...,b, is an orthonormal basis. 

Let x = 7p) apex. Recall that by the abstract orthogonal Fourier 
decomposition (2.2) we have that a, = (x,e,). Applying the abstract or- 
thogonal Fourier decomposition (2.2) to T(x) and the orthonormal basis 
b,, bg,..., by, we get 

n 
T(x) = 5_ (T(x), be) by. 
k=1 


Since 


we get that 


This means that T is a linear transformation whose matrix with respect 
to the bases S := e1,€9,...,e, and B := by, be,..., by, is identity matrix, 


Tle. s =f. 


An alternative way to show that T is a linear transformation is the 
following direct calculation 


T(x + ay) — (T(x) + aT(y))|? = (7% + ay) — T(x)) — aT (y)|? 
= ||T(x + ay) — T(x)|)? + o? ||T(y)|)? — 2a(T(x + ay) — T(x), T(y)) 
= |x + ay — x||? + o2lly|? — 2a(T(« + ay), T(y)) + 2a(T (x), Tly)) 
= a" lly||? + a? lly||? — 2a(x + ay, y) + 2a(x, y) 

= 2a*lly||? — 2a(x, y) — 207(y, y) + 20(x,y) = 0 


Therefore 

T(x — ay) = T(x) + aT (y), 
which implies that T is linear (taking x = 0 or a = 1 we get two properties 
from the definition of a linear transformation). 

So, T is a linear transformation satisfying ||Tx|| = ||x||, ie. T is an 
isometry. Since T : X — X, T is unitary transformation (see Proposition 
6.3). That completes the proof, since an orthogonal transformation is simply 
a unitary transformation in a real inner product space. 


Exercises. 


7.1. Give an example of a rigid motion T in C”, T(0) = 0, which is not a linear 
transformation. 
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8. Complexification and decomplexification 


This section is probably a bit more abstract than the rest of the chapter, 
and can be skipped at the first reading. 


8.1. Decomplexification. 


8.1.1. Decomplezification of a vector space. Any complex vector space can 
be interpreted a real vector space: we just need to forget that we can multiply 
vectors by complex numbers and act as only multiplication by real numbers 
is allowed. 


For example, the space C” is canonically identified with the real space 
R?": each complex coordinate z, = zz + iyz gives us 2 real ones xz and yp. 


“Canonically” here means that this is a standard, most natural way of 
identifying C” and R?”. Note, that while the above definition gives us a 
canonical way to get real coordinates from complex ones, it does not say 
anything about ordering real coordinates. 


In fact, there are two standard ways to order the coordinates xp, yp. 

One way is to take first the real parts and then the imaginary parts, so the 
ordering is %1,%2,...,2%n,Y1,Y2,---;Yn- The other standard alternative is 
the ordering 21, y1, %2,Y2,---;2n;Yn- The material of this section does not 
depend on the choice of ordering of coordinates, so the reader does not have 
to worry about picking an ordering. 
8.1.2. Decomplexification of an inner product. It turns out that if we are 
given a complex inner product (in a complex space), we can in a canonical 
way get a real inner product from it. To see how we can do that, let as 
first consider the above example of C” canonically identifies with R?”. Let 
(x,y), denote the standard inner product in C”, and (X,Y), be the standard 
inner product in R?” (note that the standard inner product in R” does not 
depend on the ordering of coordinates). Then (see Exercise 8.1 below) 


(8.1) (XY)p = Re(x, y) 


This formula can be used to canonically define a real inner product from 
the complex one in general situation. Namely, it is an easy exercise to show 
that if (x,y). is an inner product in a complex inner product space, then 
(x,y). defined by (8.1) is a real inner product (on the corresponding real 
space). 


Summarizing we can say that 


To decomplexify a complex inner product space we simply “for- 
get” that we can multiply by complex numbers, i.e. we only allow 
multiplication by reals. The canonical real inner product in the 
decomplexified space is given by formula (8.1) 
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Remark. Any (complex) linear linear transformation on C” (or, more gen- 
erally, on a complex vector space) gives as a real linear transformation: it is 
simply the fact that if T(ax + By) = aTx + BTy holds for a, 8 € C, then 
it of course holds for a, € R. 


The converse is not true, i.e. a (real) linear transformation on the de- 
complexification R?” of C” does not always give a (complex) linear trans- 
formation of C” (the same in the abstract settings). 


For example, if one considers the case n = 1, then the multiplication by a 
complex number z (general form of a linear transformation in C!) treated as 
a linear transformation in R? has a very specific structure (can you describe 
it?). 


8.2. Complexification. We can also do a converse, namely get a complex 
inner product space from a real one: in fact, you probably already did it 
before, without paying much attention to it. 


Namely, given a real inner product space R” we can obtain a complex 
space C” out of it by allowing complex coordinates (with the standard inner 
product in both cases). The space R” in this case will be a real’subspace of 
C” consisting of vectors with real coordinates. 


Abstractly, this construction can be described as follows: given a real 
vector space X we can define its complexification X, as the collection of 
all pairs [x1, x2], X1,X2 € X with the addition and multiplication by a real 
number a are defined coordinate-wise, 


[x1, x2] + [y1, yo] = [x1 t+ 1, x2 + yo], a[x1, x2] = [ax), ax]. 

If X = R” then the vector x; consists of real parts of complex coordinates 
of C” and the vector x2 of the imaginary parts. Thus informally one can 
write the pair [x1, x2] as x; + iXe. 

To define multiplication by complex numbers we define multiplication 
by i as 

i[X1, X2] = [—Xa, x1] 

(writing [x1, X29] as x2 +7%x_ we can see that it must be defined this way) and 
define multiplication by arbitrary complex numbers using second distributive 
property (a + B)v = av + Bv. 

If, in addition, X is an inner product space we can extend the inner 
product to X,, by 


([x1, xa], v1.y2l) = (X1, Yi) y + (x2, 92) y — i(%1,¥2)y + 4(%2, 1) y - 


The easiest way to see that everything is well defined, is to fix a basis (an 
orthonormal basis in the case of a real inner product space) and see what 


2Real subspace mean the set closed with respect to sum and multiplication by real numbers 
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this construction gives us in coordinates. Then we can see that if we treat 
vector x; as the vector consisting of the real parts of complex coordinates, 
and vector x2 as the vector consisting of imaginary parts of coordinates, 
then this construction is exactly the standard complexification of R” (by 
allowing complex coordinates) described above. 


The fact that we can express this construction in coordinate-free way, 
without picking a basis and working with coordinates, means that the result 
does not depend on the choice of a basis. 


So, the easiest way to think about complexification is probably as follows: 


To construct a complexification of a real vector space X we can 
pick a basis (an orthonormal basis if X is a real inner product 
space) and then work with coordinates, allowing the complex ones. 
The resulting space does not depend on the choice of a basis; we 
can get from one coordinates to the others by the standard change 
of coordinate formula. 


Note, that any linear transformation T in the real space X gives rise to 
a linear transformation T, in the complexification X, c 


The easiest way to see that is to fix a basis in X (an orthonormal basis if 
X is areal inner product space) and to work in a coordinate representation: 
in this case T), has the same matrix as T. In the abstract representation we 
can write 
Te Ix, Xo] = (Tx, Txg]. 
On the other hand, not all linear transformations in X¢ can be obtained 
from the transformations in X; if we do complexification in coordinates, 
only the transformations with real matrices work. 


Note, that this is completely opposite to the situation in the case of 
decomplexification, described in Section 8.1. 


An attentive reader probably already noticed that the operations of com- 
plexification and decomplixification are not the inverse of each other. First, 
the space and its complexification have the same dimension, while the de- 
complixification of an n-dimensional space has dimension 2n. Moreover, as 
we just discussed, the relation between real and complex linear transforma- 
tions is completely opposite in these cases. 

In the next section we discuss the operation, inverse in some sense to 
decomplexification. 


8.3. Introducing complex structure to a real space. The construction 
described in this section works only for real spaces of even dimension. 


8.3.1. An elementary way to introduce a complex structure. Let X be a real 
inner product space of dimension 2n. We want invert the decomplexification 
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procedure to introduce a complex structure on X, i.e. to identify this space 
with a complex space such that its decomplexification (see Section 8.1) give 
us the original space X. The simplest idea is to fix an orthonormal basis in 
X and then split the coordinates in this basis into two equal parts. 


We than treat one half of the coordinates (say coordinates 71, 72,...,2n) 
as real parts of complex coordinates, and treat the rest as the imaginary 
parts. Then we have to join real and imaginary parts together: for example 
if we treat ©1,22,...,%m as real parts and %n41,%n+42,...,%an aS imaginary 
parts, we can define complex coordinates zp = tp + i%n+p- 

Of course, the result will generally depend on the choice of the orthonor- 
mal basis and on the way we split the real coordinates into real and imagi- 
nary parts and on how we join them. 


One can also see from the decomplexification construction described in 
Section 8.1 that all complex structures on a real inner product space X can 
be obtained in this way. 


8.3.2. From elementary to abstract construction of complex structure. The 
above construction can be described in an abstract, coordinate-free way. 
Namely, let us split the space X as X = E © E+, where E is a subspace, 
dim E = n (so dim E+ = n as well), and let Uj : E — E+ be a unitary 
(more precisely, orthogonal, since our spaces are real) transformation. 


Note, that if v1, v2,...,Vn is an orthonormal basis in F, then the system 
Uov1, Upve,...,UpVn, is an orthonormal basis in E+, so 
V1,V2,---;Vn, Uov1, Uova, fray UoVn 


is an orthonormal basis in the whole space X. 


If 21, 22,...,%an are coordinates of a vector x in this basis, and we 
treat t, + i%pik, k = 1,2,...,n as complex coordinates of x, then the 
multiplication by i is represented by the orthogonal transformation U which 
is given in the orthogonal basis of subspaces E, E+ by the block matrix 


This means that 


, xX], 77 xX] = 0 —U5 xX] 
x2 x2 Uo 0 X2 
x, € E, xg € E+. 
Clearly, U is an orthogonal transformation such that U? = —I. There- 
fore, any complex structure on X is given by an orthogonal transformation 


U, satisfying U? = —I; the transformation U gives us the multiplication by 
the imaginary unit 7. 
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The converse is also true, namely any orthogonal transformation U sat- 
isfying U? = —I defines a complex structure on a real inner product space 
X. Let us explain how. 


8.3.3. An abstract construction of complex structure. Let us first consider 
an abstract explanation. To define a complex structure, we need to define 
the multiplication of vectors by complex numbers (initially we only can 
multiply by real numbers). In fact we need only to define the multiplication 
by i, the rest will follow from linearity in the original real space. And the 
multiplication by i is given by the orthogonal transformation U satisfying 
U? = —I. 

Namely, if the multiplication by 7 is given by U, ix = Ux, then the 
complex multiplication must be defined by 


(8.2) (a + Bi)x := ax + BUx = (al + BU)x, a,BER, xEXx. 


We will use this formula now as the definition of complex multiplication. 


It is not hard to check that for the complex multiplication defined above 
by (8.2) all axioms of complex vector space are satisfied. One can see that, 
for example by using linearity in the real space X and noticing that that 
with respect to algebraic operations (addition and multiplication) the linear 
transformations of form 


al = Bu, a,BeER, 


behave absolutely the same way as complex numbers a + 3, i.e such trans- 
formations give us a representation of the field of complex numbers C. 

This means that first, a sum and a product of transformations of the form 
aI-+ GU is a transformation of the same form, and to get the coeeficients a, 
6 of the result we can perform the operation on the corresponding complex 
numbers and take the real and imaginary parts of the result. Note, that 
here we need the identity U? = —I, but we do not need the fact that U is 
an orthogonal transformation. 

Thus, we got the structure of a complex vector space. To get a complex 
inner product space we need to introduce complex inner product, such that 
the original real inner product is the real part of it. 

We really do not have any choice here: noticing that for a complex inner 
product 


Im(x, y) = Re [—i(x, Yel = Re(x, ty),, 


we see that the only way to define the complex inner product is 


(8.3) (XY) = (x, Y)p + a(x, Uy)»: 
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Let us show that this is indeed an inner product. We will need the fact 
that U* = —U, see Exercise 8.4 below (by U* here we mean the adjoint with 
respect to the original real inner product). 


To show that (y,x), = (x,y), we use the udentity U* = —U and 
symmetry of the real inner product: 


(Y:X)¢ =(Y)X)p + ily, Ux) 
=(X% Y)p + i(Ux,y)p 
= (X%, Y)p — i(x, Uy)p 
=(X% Y)p + i(x,Uy)p 
=(% Ye 


To prove the linearity of the complex inner product, let us first notice 
that (x,y), is real linear in the first (in fact in each) argument, i.e. that 
(ax + By,2Z), = a(X, Z)q + BY. 2) for a, 8 € R; this is true because each 
summand in the right side of (8.3) is real linear in argument. 

Using real linearity of (x,y), and the identity U* = —U (which implies 
that (Ux, y), = —(x,Uy),) together with the orthogonality of U, we get 
the following chain of equalities 


((al + BU)X,y)¢ = a(X,y)p + BUX Y)¢ 
= a(x, y)o + B [(UX,y)_ + (Ux, Uy), | 
= a(x, y)o + B [-(%, Uy) + (x y)g] 
= a(x, Y)¢ + Bi [(% yp + i(x, Uy)g| 
= 0(%,Y)o + Bi(X%,Y)o = (a+ Bi)(X%Y)e, 


which proves complex linearity. 
Finally, to prove non-negativity of (x, x) c let us notice (see Exercise 8.3 
below) that (x, Ux), = 0, so 


(x, X)o = (x, X)p a |||)? 2 0. 


8.3.4. The abstract construction via the elementary one. For a reader who is 
not comfortable with such “high brow” and abstract proof, there is another, 
more hands on, explanation. 


Namely, it can be shown, see Exercise 8.5 below, that there exists a 
subspace E, dim E = n (recall that dim X = 2n), such that the matrix of U 
with respect to the decomposition X = E @ E+ is given by 


_{ 9 —-U5 
v=( 4, 0 ) 


where Uy : E > E+ is some orthogonal transformation. 
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Let vi,V2,...,Vn be an orthonormal basis in E. Then the system 
Uov1, Uove,...,Uovn is an orthonormal basis in E+, so 
V1, V2,---,Vn, Uovi,; Uove, sang Uovn 


is an orthonormal basis in the whole space X. Considering the coordinates 
%1,%9,...,£2n in this basis and treating x, + 71%p4, as complex coordinates, 
we get an elementary, “coordinate” way of defining complex structure, which 
was already described above. But if we look carefully, we see that multipli- 
cation by 7 is given by the transformation U: it is trivial for x € E and for 
y € Et, and so it is true for all real linear combinations of ax + By, i.e. for 
all vectors in X. 


But that means that the abstract introduction of complex structure 
and the corresponding elementary approach give us the same result! And 
since the elementary approach clearly gives us the a complex structure, the 
abstract approach gives us the same complex structure. 

Exercises. 


8.1. Prove formula (8.1). Namely, show that if 


X= (21, 22,---,2n)", Y = (wi, W2,--., Wn)’, 


Ze = TE +1YK, Wk = UR + 1K, Lk, Yk, Uk, Uk € R, then 


n n n 
Re()- 24 ) = .y TpUk + LS YRVUk- 
k=1 k=1 k=1 


8.2. Show that if (X%Y)¢ is an inner product in a complex inner product space, 
then (x,y), defined by (8.1) is a real inner product space. 


8.3. Let U be an orthogonal transformation (in a real inner product space X), 
satisfying U? = —I. Prove that for all x € X 


Ux 1 x. 


8.4. Show, that if U is an orthogonal transformation satisfying U? = —I, then 
U*=-U. 


8.5. Let U be an orthogonal transformation in a real inner product space, satisfying 
U? = —I. Show that in this case dim X = 2n, and that there exists a subspace 
E Cc X, dim E = n, and an orthogonal transformation Uy : E > E+ such that U 
in the decomposition X = E @ E+ is given by the block matrix 


_f{ 0 —-U>s 
(4-8) 
This statement can be easily obtained from Theorem 5.1 of Chapter 6, if one notes 
that the only rotation Ry in R? satisfying R2 = —I are rotations through a = +7/2. 


However, one can find an elementary proof here, not using this theorem. For 
example, the statement is trivial if dim X = 2: in this case we can take for F any 
one-dimensional subspace, see Exercise 8.3. 
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Then it is not hard to show, that such operator U does not exists in R?, and 
one can use induction in dim X to complete the proof. 


Chapter 6 


Structure of operators 
in inner product 
spaces. 


In this chapter we are again assuming that all spaces are finite-dimensional. 
Again, we are dealing only with complex or real spaces, theory of inner 
product spaces does not apply to spaces over general fields. When we are 
not mentioning what space are we in, everything work for both complex and 
real spaces. 

To avoid writing essentially the same formulas twice we will use the 
notation for the complex case: in the real case it give correct, although 
sometimes a bit more complicated, formulas. 


1. Upper triangular (Schur) representation of an operator. 


Theorem 1.1. Let A: X > X be an operator acting in a complex inner 
product space. There exists an orthonormal basis uy, U2,...,Un in X such 
that the matrix of A in this basis is upper triangular. 

In other words, anyn x n matrix A can be represented as A = UTU*, 
where U is a unitary, and T is an upper triangular matriz. 


Proof. We prove the theorem using the induction in dim X. If dim X = 1 
the theorem is trivial, since any 1 x 1 matrix is upper triangular. 


Suppose we proved that the theorem is true if dim X = n— 1, and we 
want to prove it for dim X =n. 
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Let A; be an eigenvalue of A, and let uz, ||ui|| = 1 be a corresponding 
eigenvector, Auy = A1u1. Denote EF = uz, and let v2,...,Vn be some 
orthonormal basis in FE (clearly, dim £ = dim X—1 = n—1), so uy, v2, ..-, Vn 


is an orthonormal basis in X. In this basis the matrix of A has the form 


(1.1) 


here all entries below A; are zeroes, and * means that we do not care what 
entries are in the first row right of Aq. 

We do care enough about the lower right (n— 1) x (n—1) block, to give 
it name: we denote it as Ay. 

Note, that A; defines a linear transformation in EF, and since dim EF = 
n — 1, the induction hypothesis implies that there exists an orthonormal 


basis (let us denote it as U2,...,U,) in which the matrix of A; is upper 
triangular. 
So, matrix of A in the orthonormal basis uj, u2g,...,Un has the form 


(1.1), where matrix A; is upper triangular. Therefore, the matrix of A in 
this basis is upper triangular as well. 


Remark. Note, that the subspace E = ut introduced in the proof is not invariant 
under A, i.e. the inclusion AE C E does not necessarily hold. That means that A, 
is not a part of A, it is some operator constructed from A. 

Note also, that AF C E if and only if all entries denoted by * (i.e. all entries 
in the first row, except 1) are zero. 


Remark. Note, that even if we start from a real matrix A, the matrices U 
and T can have complex entries. The rotation matrix 


ee pag) pe ea 


sina cosa 


is not unitarily equivalent (not even similar) to a real upper triangular ma- 
trix. Indeed, eigenvalues of this matrix are complex, and the eigenvalues of 
an upper triangular matrix are its diagonal entries. 


Remark. An analogue of Theorem 1.1 can be stated and proved for an 
arbitrary vector space, without requiring it to have an inner product. In 
this case the theorem claims that any operator have an upper triangular 
form in some basis. A proof can be modeled after the proof of Theorem 1.1. 
An alternative way is to equip V with an inner product by fixing a basis in 
V and declaring it to be an orthonormal one, see Problem 2.4 in Chapter 5. 
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Note, that the version for inner product spaces (Theorem 1.1) is stronger 
than the one for the vector spaces, because it says that we always can find 
an orthonormal basis, not just a basis. 


The following theorem is a real-valued version of Theorem 1.1 


Theorem 1.2. Let A: X — X be an operator acting in areal inner product 
space. Suppose that all eigenvalues of A are real (meaning that A has exactly 
n=dimX real eigenvalues, counting multiplicities). Then there exists an 
orthonormal basis Wi, U2,...,Un in X such that the matrix of A in this basis 
is upper triangular. 

In other words, any realn x n matrix A with all real eigenvalues can be 
represented as T = UTU* = UTU!’, where U is an orthogonal, and T is a 
real upper triangular matrices. 


Proof. To prove the theorem we just need to analyze the proof of Theorem 
1.1. Let us assume (we can always do this without loss of generality) that 
the operator (matrix) A acts in R”. 

Suppose, the theorem is true for (n — 1) x (n — 1) matrices. As in the 
proof of Theorem 1.1 let \; be a real eigenvalue of A, ui € R”, ||ui|| = 1 be 
a corresponding eigenvector, and let v2,...,Vvn be on orthonormal system 
(in R”) such that uj, v2,...,V, is an orthonormal basis in R”. 

The matrix of A in this basis has form (1.1), where A; is some real 
matrix. 

If we can prove that matrix A; has only real eigenvalues, then we are 
done. Indeed, then by the induction hypothesis there exists an orthonormal 


basis ug,...,U, in FE = ut such that the matrix of A, in this basis is 
upper triangular, so the matrix of A in the basis uy, u2,..., Un is also upper 


triangular. 
To show that A, has only real eigenvalues, let us notice that 
det(A — AT) = (A, — A) det(Ay — A) 


(take the cofactor expansion in the first row, for example), and so any eigen- 
value of A, is also an eigenvalue of A. But A has only real eigenvalues! 


Exercises. 


1.1. Use the upper triangular representation of an operator to give an alternative 
proof of the fact that determinant is the product and the trace is the sum of 
eigenvalues counting multiplicities. 


2. Spectral theorem for self-adjoint and normal operators. 


In this section we deal with matrices (operators) which are unitarily equiv- 
alent to diagonal matrices. 
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Let us recall that an operator is called self-adjoint if A= A*. A matrix 
of a self-adjoint operator (in some orthonormal basis), i.e. a matrix satisfying 
A* = A is called a Hermitian matrix. 

The terms self-adjoint and Hermitian essentially mean the same. Usu- 
ally people say self-adjoint when speaking about operators (transforma- 
tions), and Hermitian when speaking about matrices. We will try to follow 
this convention, but since we often do not distinguish between operators and 
their matrices, we will sometimes mix both terms. 


Theorem 2.1. Let A = A* be a self-adjoint operator in an inner product 
space X (the space can be complex or real). Then all eigenvalues of A are 
real, and there exists and orthonormal basis of eigenvectors of A in X. 


This theorem can be restated in matrix form as follows 


Theorem 2.2. Let A = A* be a self-adjoint (and therefore square) matriz. 
Then A can be represented as 


A=UDU’", 


where U is a unitary matrix and D is a diagonal matrix with real entries. 


Moreover, if the matrix A is real, matrix U can be chosen to be real 
(i.e. orthogonal). 


Proof. To prove Theorems 2.1 and 2.2 let us first apply Theorem 1.1 (The- 
orem 1.2 if X is a real space) to find an orthonormal basis in X such that 
the matrix of A in this basis is upper triangular. Now let us ask ourself a 
question: What upper triangular matrices are self-adjoint? 

The answer is immediate: an upper triangular matrix is self-adjoint if 
and only if it is a diagonal matrix with real entries. Theorem 2.1 (and so 
Theorem 2.2) is proved. 


Remark. In many textbooks only real matrices are considered, and The- 
orem 2.2 is often called the “Spectral Theorem for symmetric matrices”. 
However, we should emphasize that the conclusion of Theorem 2.2 fails for 
complex symmetric matrices: the theorem holds for Hermitian matrices, and 
in particular for real symmetric matrices. 


Let us give an independent proof to the fact that eigenvalues of a self- 
adjoint operators are real. Let A = A* and Ax = \x, x #0. Then 


(Ax, x) = (Ax, x) = A(x, x) = Allx||?. 
On the other hand, 
(Ax, x) = (x, A*x) = (x, Ax) = (x, Ax) = A(x, x) = Allx||?, 
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so Al|x||? = Al|x||?. Since ||x|| 4 0 (x 4 0), we can conclude \ = A, so ) is 
real. 

It also follows from Theorem 2.1 that eigenspaces of a self-adjoint oper- 
ator are orthogonal. Let us give an alternative proof of this result. 


Proposition 2.3. Let A = A* be a self-adjoint operator, and let u,v be its 
eigenvectors, Au = Au, Av = piv. Then, if X # pw, the eigenvectors u and 
v are orthogonal. 


Proof. This proposition follows from the spectral theorem (Theorem 1.1), 
but here we are giving a direct proof. Namely, 


(Au, v) = (Au, v) = A(u, v). 
On the other hand 
(Au, v) = (u, A*v) = (u, Av) = (u, uv) = G(u, v) = (u,v) 


(the last equality holds because eigenvalues of a self-adjoint operator are 
real), so A(u, v) = u(u,v). If A ¥ p it is possible only if (u,v) = 0. 


Now let us try to find what matrices are unitarily equivalent to a diagonal 
one. It is easy to check that for a diagonal matrix D 
D*D = DD*. 


Therefore A*A = AA* if the matrix of A in some orthonormal basis is 
diagonal. 


Definition. An operator (matrix) N is called normal if N*N = NN*. 


Clearly, any self-adjoint operator (A*A = AA*) is normal. Also, any 
unitary operator U : X — X is normal since U*U = UU* = I. 

Note, that a normal operator is an operator acting in one space, not from one 
space to another. So, if U is a unitary operator acting from one space to another, 
we cannot say that U is normal. 


Theorem 2.4. Any normal operator N in a complex vector space has an 
orthonormal basis of eigenvectors. 
In other words, any matrix N satisfying N*N = NN* can be represented 
as 
N=UDU", 
where U is a unitary matrix, and D is a diagonal one. 
Remark. Note, that in the above theorem even if N is a real matrix, we 


did not claim that matrices U and D are real. Moreover, it can be easily 
shown, that if D is real, N must be self-adjoint. 
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Proof of Theorem 2.4. To prove Theorem 2.4 we apply Theorem 1.1 to 
get an orthonormal basis, such that the matrix of N in this basis is upper 
triangular. To complete the proof of the theorem we only need to show that 
an upper triangular normal matrix must be diagonal. 

We will prove this using induction in the dimension of matrix. The case 
of 1 x 1 matrix is trivial, since any 1 x 1 matrix is diagonal. 

Suppose we have proved that any (n — 1) x (n — 1) upper triangular 
normal matrix is diagonal, and we want to prove it for n x n matrices. Let 
N ben xX n upper triangular normal matrix. We can write it as 


a1 | a1,2 lati Ain 


0 


N= 
M 


0 
where Nj is an upper triangular (nm — 1) x (n — 1) matrix. 
Let us compare upper left entries (first row first column) of N*N and 
NN*. Direct computation shows that that 


(N*N)i 1 = Giaaia = |ar./ 


and 
(NN*)11 = laral? + lai2|? +... + larnl?. 
So, (N*N)i1 = (NN*)14 if and only if a2 =... = an = 0. Therefore, 
the matrix N has the form 
0 
N= 
Ni 
It follows from the above representation that 
lei? |: nae 0 gal? (0° ss. 

0 0 
N*N = ; , NN*= 

: NIM, 2 Ni Ny 

0 0 


so N/N, = Ni Njf. That means the matrix N; is also normal, and by the 
induction hypothesis it is diagonal. So the matrix N is also diagonal. 


The following proposition gives a very useful characterization of normal 
operators. 


Proposition 2.5. An operator N: X > X is normal if and only if 
|| Nx|| = || N*x|| Vx € X. 
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Proof. Let N be normal, N*N = NN*. Then 
||Nx|? = (Vx, Nx) = (N*Nx, x) = (NIN*x,x) = (N*x, N*x) = ||N*x|? 
so || Nx|| = || N*xl]. 
Now let 
|| Nx|| = || V*x|| Vx € X. 
The Polarization Identities (Lemma 1.9 in Chapter 5) imply that for all 
xyEeXx 


(N*Nx,y) = (Nx, Ny) = Nx+aNy|l? 


M 


i) 
| 
B 
S 


a a N(x +ay)||? 


Yall" + ay)? 


Ble Ble Ble Ble 


» a||N*x + aN*y||? 


= (N*x, N*y) = (NN*x,y) 
and therefore (see Corollary 1.6) N*N = NN*. 


Exercises. 
2.1. True or false: 


Every unitary operator U : X — X is normal. 


a 
b) A matrix is unitary if and only if it is invertible. 

c) If two matrices are unitarily equivalent, then they are also similar. 
d) The sum of self-adjoint operators is self-adjoint. 

e) The adjoint of a unitary operator is unitary. 

f) The adjoint of a normal operator is normal. 


If all eigenvalues of a linear operator are 1, then the operator must be 
unitary or orthogonal. 


h) If all eigenvalues of a normal operator are 1, then the operator is identity. 


i) A linear operator may preserve norm but not the inner product. 
2.2. True or false: The sum of normal operators is normal? Justify your conclusion. 
2.3. Show that an operator unitarily equivalent to a diagonal one is normal. 


2.4. Orthogonally diagonalize the matrix, 


deol a) 


Find all square roots of A, i.e. find all matrices B such that B? = A. 
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Note, that all square roots of A are self-adjoint. 
2.5. True or false: any self-adjoint matrix has a self-adjoint square root. Justify. 
2.6. Orthogonally diagonalize the matrix, 
22 
a= (74): 
ie. represent it as A = UDU%, where D is diagonal and U is unitary. 


Among all square roots of A, i.e. among all matrices B such that B? = A, find 
one that has positive eigenvalues. You can leave B as a product. 


2.7. True or false: 
a) A product of two self-adjoint matrices is self-adjoint. 
b) If A is self-adjoint, then A* is self-adjoint. 

Justify your conclusions 

2.8. Let A be m x n matrix. Prove that 

a) A*A is self-adjoint. 

All eigenvalues of A*A are non-negative. 

A* A+ is invertible. 


oa 


i) 


2.9. Give a proof if the statement is true, or give a counterexample if it is false: 


a) If A= A* then A + iJ is invertible. 
If U is unitary, U + 31 is invertible. 


ig 


i) 


If a matrix A is real, A — iJ is invertible. 
2.10. Orthogonally diagonalize the rotation matrix 
cosa —sina 
oe ( ) 
sina cosa 
where a is not a multiple of 7. Note, that you will get complex eigenvalues in this 
case. 
2.11. Orthogonally diagonalize the matrix 
A=( cos a sina . 
sina —cosa 
Hints: You will get real eigenvalues in this case. Also, the trigonometric identities 


sin2a¢ = 2sinxcosz, sin? x = (1 — cos2zx)/2, cos? x = (1 + cos2zx)/2 (applied to 
x = a/2) will help to simplify expressions for eigenvectors. 


2.12. Can you describe the linear transformation with matrix A from the previous 
problem geometrically? Is has a very simple geometric interpretation. 


2.13. Prove that a normal operator with unimodular eigenvalues (i.e. with all 
eigenvalues satisfying |A,| = 1) is unitary. Hint: Consider diagonalization 


2.14. Prove that a normal operator with real eigenvalues is self-adjoint. 
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2.15. Show by example that conclusion of Theorem 2.2 fails for complex symmetric 
matrices. Namely 


a) construct a (diagonalizable) 2 x 2 complex symmetric matrix not admitting 
an orthogonal basis of eigenvectors; 


b) construct a 2x 2 complex symmetric matrix which cannot be diagonalized. 


3. Polar and singular value decompositions. 


3.1. Positive definite operators. Square roots. 

Definition. A self adjoint operator A: X > X is called positive definite if 
(Ax, x) > 0 Vx 4 0, 

and it is called positive semidefinite if 
(Ax, x) > 0 Vx e X. 


We will use the notation A > 0 for positive definite operators, and A > 0 
for positive semi-definite. 


The following theorem describes positive definite and semidefinite oper- 
ators. 


Theorem 3.1. Let A= A*. Then 


1. A> 0 if and only if all eigenvalues of A are positive. 
2. A> 0 if and only if all eigenvalues of A are non-negative. 


Proof. Pick an orthonormal basis such that matrix of A in this basis is 
diagonal (see Theorem 2.1). To finish the proof it remains to notice that a 
diagonal matrix is positive definite (positive semidefinite) if and only if all 
its diagonal entries are positive (non-negative). 


Corollary 3.2. Let A = A* > 0 be a positive semidefinite operator. There 
exists a unique positive semidefinite operator B such that B? = A 


Such B is called (positive) square root of A and is denoted as VA or 
A’?, 


Proof. Let us prove that VA exists. Let v1,v2,...,Vn be an orthonor- 
mal basis of eigenvectors of A, and let Ai, A2,...,An be the corresponding 
eigenvalues. Note, that since A > 0, all A, > 0. 

In the basis v1,V2,...,Vn the matrix of A is a diagonal matrix 


diag{A1, A2,..-,An} with entries \1, A2,...,An on the diagonal. Define the 
matrix of B in the same basis as diag{V/A1, V/A2,---, VAn}- 


Clearly, B = B* > 0 and B? = A. 
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To prove that such B is unique, let us suppose that there exists an op- 
erator C = C* > 0 such that C? = A. Let Uj, U2,..., Uy be an orthonormal 
basis of eigenvectors of C, and let 11, j2,..., Un be the corresponding eigen- 
values (note that ju, > 0 Vk). The matrix of C in the basis uj, ug,...,Un 
is a diagonal one diag{ p11, 112, ..., fn}, and therefore the matrix of A = C? 
in the same basis is diag{y7, u3,..., 2}. This implies that any eigenvalue 
d of A is of form Le, and, moreover, if Ax = Ax, then Cx = Vx. 


Therefore in the basis vj, v2,...,Vn above, the matrix of C’ has the 


diagonal form diag{/1, V\2,---, VAn}, ie. B= C. 


3.2. Modulus of an operator. Singular values. Consider an operator 
A:X —+ Y. Its Hermitian square A*A is a positive semidefinite operator 
acting in X. Indeed, 


(A*A)* = A*A™* = A*A 
and 
(A* Ax, x) = (Ax, Ax) = ||Ax||? > 0 Vx € X. 
Therefore, there exists a (unique) positive-semidefinite square root R = 
VA*A. This operator R is called the modulus of the operator A, and is 
often denoted as | A]. 


The modulus of A shows how “big” the operator A is: 
Proposition 3.3. For a linear operator A: X > Y 
I||Alx|| = || Axl] Vx € X. 
Proof. For any x € X 
II Alx]? = (|Alx, |Alx) = (|A/*|Alx, x) = (|Al?x, x) 
= (A* Ax, x) = (Ax, Ax) = ||Ax||? 


Corollary 3.4. 
Ker A = Ker|A| = (Ran|A])+. 


Proof. The first equality follows immediately from Proposition 3.3, the sec- 
ond one follows from the identity Ker T = (Ran T*)+ (|A| is self-adjoint). 


Theorem 3.5 (Polar decomposition of an operator). Let A: X + X be an 
operator (square matriz). Then A can be represented as 


A=UIAl, 
where U is a unitary operator. 


Remark. The unitary operator U is generally not unique. As one will see 
from the proof of the theorem, U is unique if and only if A is invertible. 
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Remark. The polar decomposition A = U|A| also holds for operators A : 
X — Y acting from one space to another. But in this case we can only 
guarantee that U is an isometry from Ran|A| = (Ker A)+ to Y. 


If dim X < dimY this isometry can be extended to an isometry from 
the whole X to Y (if dim X = dimY this will be a unitary operator). 
Proof of Theorem 3.5. Consider a vector x € Ran|A|. Then vector x 
can be represented as x = |A|v for some vector v € X. 

Define Upx := Av. By Proposition 3.3 

||Uoxl] = ||Av] = IITAlvil = Ibs 


so it looks like U is an isometry from Ran|A| to X. 


But first we need to prove that Uo is well defined. Let v1 be another 
vector such that x = |A|v;. But x = |A|v = |A]v, means that v — vy € 
Ker |A| = Ker A (cf Corollary 3.4), so Av = Avy, meaning that Upx is well 
defined. 

By the construction A = Up|A|. We leave as an exercise for the reader 
to check that Up is a linear transformation. 


To extend Up to a unitary operator U, let us find some unitary transfor- 
mation U, : Ker A > (Ran A)+ = Ker A*. It is always possible to do this, 
since for square matrices dim Ker A = dim Ker A* (the Rank Theorem). 

It is easy to check that U = Up + U, is a unitary operator, and that 
A=U|A|. 


3.3. Singular values. Schmidt decomposition. 


Definition. Eigenvalues of |A| are called the singular values of A. In other 
words, if Ay, A2,..., An are eigenvalues of A*A then VA1, VA2,---,VAn are 
singular values of A. 


Remark. Very often in the literature the singular values are defined as the 
non-negative square roots of the eigenvalues of A*A, without any reference 
to the modulus | A]. 


I consider the notion of the modulus of an operator to be an important 
one, so it was introduced above. However, the notion of the modulus of an 
operator is not required for what follows (defining the Schur and singular 
value decompositions). Moreover, as it will be shown below, the modulus of 
A can be easily constructed from the singular value decomposition. 


Consider an operator A: X — Y, and let o1,00,...,0n be the singu- 
lar values of A counting multiplicities. Assume also that o1,02,...,0, are 
the non-zero singular values of A, counting multiplicities. This means, in 
particular, that o, =0 fork >r. 
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By the definition of singular values the numbers o7,03,...,02 are eigen- 
values of A*A. Let v1,v2,...,Vn be an orthonormal basis! of eigenvectors 
of A* A, A* Avy = oRVK. 


Proposition 3.6. The system 
1 

Wr = —Ave, b= 12? 
ok 


is an orthonormal system. 


Proof. 


0, j#k 


of, j=k 


(Av;, Avy) = (A*Avj, ve) = (o3V;, Vk) = 05 (Vj, Vk) = { 
gj? 


since V1, V2,..., Vr is an orthonormal system. 


In the notation of the above proposition, the operator A can be repre- 
sented as 


(3.1) A= x OkWEVI, 
k=1 


or, equivalently 


e 
(3.2) Ax = x on(X, Vk) We. 
k=1 
Indeed, we know that v1, v2,...,Vn is an orthonormal basis in X. Then 


substituting x = vj into the right side of (3.2) we get 
Yi 
Son (¥;,VE) We = o;(Vj, Vj) Wy = OjWj = Av; if j = 1, 2; Bey 
k=1 


and 
a 


s on (VEVj) We = 0 = Avy for j >r. 

k=1 
So the operators in the left and right sides of (3.1) coincide on the basis 
V1, V2,---,;Vn, 80 they are equal. 


Definition. The above decomposition (3.1) (or (3.2)) is called the Schmidt 
decomposition of the operator A. 


Remark. Schmidt decomposition of an operator is not unique. Why? 


lWe know, that for a self-adjoint operator (A*A in our case) there exists an orthonormal 
basis of eigenvectors. 
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Lemma 3.7. Let A can be represented as 


Tr 
A= S> OLWkV;, 
k=1 


where ox > 0 and v1,V2,...,Vr, W1,W2,---,W, are some orthonormal sys- 
tems. 


Then this representation gives a Schmidt decomposition of A. 


Proof. We only need to show that v1, v2,...,v, are eigenvectors of A*A, 
A* Avy, = O2VE. Since w1, W2,...,W, is an orthonormal system, 


* 0, ] k 
WW; = (Wi, We) = Obg = { 1 : ‘ k, 


and therefore 
Tr 
A A= S> ORV}. 
k=1 


Since vi, Vv2,...,V, is an orthonormal system 


‘ 

* Pan 2 Hee One ee 

A* Av; = y OELVEV AVG = OFV5 
k=1 


thus vz are eigenvectors of A*A. 


Corollary 3.8. Let 


e 
A= Se OhWeVi, 
k=1 


be a Schmidt decomposition of A. Then 


x 
* * 
A*= ) OkVEWE 
k=1 


is a Schmidt decomposition of A* 


3.4. Matrix representation of the Schmidt decomposition. Singu- 
lar value decomposition. The Schmidt decomposition can be written in 
a nice matrix form. Namely, let us assume that A : F” — F™, where F 
is either C or R (we can always do that by fixing orthonormal bases in X 
and Y and working with coordinates in these bases). Let o1,02,...,0, be 
non-zero singular values of A, and let 


r 
A= 3 OLWkV;, 
k=1 


be a Scmidt decomposition of A. 
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As one can easily see, this equality can be rewritten as 


(3.3) A=WSV*, 
where © = diag{o1,02,...,0-} and V and W are matrices with columns 
V1,V2,-.-,Vr and wi, Wo,...,w, respectively. (Can you tell what is the 


size of each matrix?) 

Note, that since v1, v2,...,V, and w ,, W2,...,W, are orthonormal sys- 
tems, the matrices V and W are isometries. Note also that r = rank A, see 
Exercise 3.1 below. 

If the matrix A is invertible, then m = n = r, the matrices Vz W are 
unitary and > is an invertible diagonal matrix. 


It turns out that it is always possible to write a representation similar 
(3.3) with unitary V and W instead of V and W, and in many situations 
it is more convenient to work with such a representation. To write this 
representation one needs first to complete the systems vj,V2,...,v, and 
W}1, W2,...,W;, to orthogonal bases in F” and F™ respectively. 

Recall, that to complete, say v1, v2,...,V, to an orthonormal basis in 
F” one just needs to find and orthonormal basis v;+41,...,Vn in Ker V*; then 
the system v1, V2,...,Vn will be an orthonormal base in F”. And one can 
always get an orthonormal basis from an arbitrary one using Gram—Schmidt 
orthogonalization. 


Then A can be represented as 


(3.4) A=WXV", 


where V € MF,,, and W € ME, are unitary matrices with columns 
V1, V2,---;Vn and wi, Wo,...,Wm respectively, and } is a “diagonal” mx n 
matrix 

Ok j=k<r: 
3.5 Yep= ; 
35) Jk { 0 otherwise. 


In other words, to get the matrix © one has to take the diagonal matrix 
diag{o1,02,...,0r} and make it to an m x n matrix by adding extra zeroes 
“south and east”. 


Definition 3.9. For a matrix A ¢ M¥,,,, (recall that here F is always C or 
R) its singular value decomposition (SVD) is a ceepostngn of form (3.4), 


ie. a decomposition A = WEV*, where W ¢ MF,,,,V ¢ ME,,,,, are unitary 
matrices and 4 € MEFs is a “diagonal” one (meaning that o;,;, > 0 for all 
k=1,2,...,min{m,n}, and oj, =0 for all j # k). 

The representation (3.3) is often called the reduced or compact SVD. 
More ely the reduced SVD is a representation A = Wsv*, where 


Semi 


reer < min{m, n} is a diagonal matrix with strictly positive diagonal 
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entries, and W € ME,,, V € ME 


in mxr are isometries; moreover, we require 


that at least one of the matrices W and V is not square. 


Remark 3.10. It is easy to see that if A = WHV* is a singular value 
decomposition of A, then o, := ox, are singular values of A, i.e. of are 
eigenvalues of A*A. Moreover, the columns vx of V are the corresponding 
eigenvectors of A*A, A* Av, = ORVE. Note also that if o, 4 0 then wy = 
a Ave. 

All that means that any singular value decomposition A = WXV* can be 
obtained from a Schmidt decomposition (3.2) by the construction described 
above in this section. 


The reduced singular value decomposition can be interpreted as a matrix 
form of the Schmidt decomposition (3.2) for a non-invertible matrix A. For 
an invertible matrix A the matrix form of the Schmidt decomposition gives 
the singular value decomposition. 


Remark 3.11. An alternative way to interpret the singular value decom- 
position A = WXV* is to say that ¥ is the matrix of A in the (orthonormal) 
bases A = vi, v2,..., vy, and B := wi, Wo,...,Wn, i.e that © = [A] 5 4- 


We will use this interpretation later. 


3.4.1. From singular value decomposition to the polar decomposition. Note, 
that if we know the singular value decomposition A = WXV* of a square 
matrix A, we can write a polar decomposition of A: 


(3.6) A= W2XV* = (WV*)(VEV*) = UJA| 
where |A| = VUV* and U = WV*. 


To see that this indeed give us a polar decomposition let us notice that 
VxV* is a self-adjoint, positive semidefinite operator and that 


A*A = VEW*WSV* = VEDV* = VEV*VEV" = (VEV")?. 


So by the definition of |A| as the unique positive semidefinite square root 
of A*A, we can see that |A| = VuV*. The transformation WV* is clearly 
unitary, as a product of two unitary transformations, so (3.6) indeed gives 
us a polar decomposition of A. 


Note, that this reasoning only works for square matrices, because if A is 
not square, then the product V™ is not defined (dimensions do not match, 
can you see how’). 


Exercises. 


3.1. Show that the number of non-zero singular values of a matrix A coincides with 
its rank. 
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3.2. Find Schmidt decompositions A = bs 5~W,V;, for the following matrices A: 
k=1 


fe | 1 1 
coe 00], 01 
5 5 =. ul 


3.3. Let A be an invertible matrix, and let A = WXV* be its singular value 
decomposition. Find a singular value decomposition for A* and A~!. 


3.4. Find singular value decomposition A = WXV* where V and W are unitary 
matrices for the following matrices: 


—3 1 
a) A= 6 -—2 |; 
6 —2 


wa-(2 3) 


3.5. Find singular value decomposition of the matrix 
2 3 
ale) 


a) maxjjx\)<1 || Ax|| and the vectors where the maximum is attained; 


Use it to find 


b) minjx\j=1 || Ax|| and the vectors where the minimum is attained; 
c) the image A(B) of the closed unit ball in R?, B = {x € R?: ||x|| < 1}. 
Describe A(B) geometrically. 
3.6. Show that for a square matrix A, | det A| = det | A]. 
3.7. True or false 


a) Singular values of a matrix are also eigenvalues of the matrix. 
b) Singular values of a matrix A are eigenvalues of A* A. 


c) Is s is a singular value of a matrix A and c is a scalar, then 
value of cA. 


cls is a singular 


d) The singular values of any linear operator are non-negative. 
e) Singular values of a self-adjoint matrix coincide with its eigenvalues. 
3.8. Let A be an m x n matrix. Prove that non-zero eigenvalues of the matrices 
A*A and AA* (counting multiplicities) coincide. 
Can you say when zero eigenvalue of A*A and zero eigenvalue of AA* have the 


same multiplicity? 


3.9. Let s be the largest singular value of an operator A, and let A be the eigenvalue 
of A with largest absolute value. Show that |A| < s. 


3.10. Show that the rank of a matrix is the number of its non-zero singular values 
(counting multiplicities). 


4. Applications of the singular value decomposition. 179 


3.11. Show that the operator norm of a matrix A coincides with its Frobenius 
norm if and only if the matrix has rank one. Hint: The previous problem might 


help. 
2 -3 
a=(9 a): 
describe the inverse image of the unit ball, i.e. the set of all x € R? such that 
|| Ax|] < 1. Use singular value decomposition. 


3.12. For the matrix A 


4. Applications of the singular value decomposition. 


As we discussed above, the singular value decomposition is simply diago- 
nalization with respect to two different orthonormal bases. Since we have 
two different bases here, we cannot say much about spectral properties of an 
operator from its singular value decomposition. For example, the diagonal 
entries of © in the singular value decomposition (3.5) are not the eigenvalues 
of A. Note, that for A = WXV* as in (3.5) we generally have A" 4 WU"V*, 
so this diagonalization does not help us in computing functions of a matrix. 


However, as the examples below show, singular values tell us a lot about 
so-called metric properties of a linear transformation. 


Final remark: performing singular value decomposition requires finding 
eigenvalues and eigenvectors of the Hermitian (self-adjoint) matrix A*A. To 
find eigenvalues we usually computed characteristic polynomial, found its 
roots, and so on... This looks like quite a complicated process, especially if 
one takes into account that there is no formula for finding roots of polyno- 
mials of degree 5 and higher. 


However, there are very effective numerical methods of find eigenvalues 
and eigenvectors of a hermitian matrix up to any given precision. These 
methods do not involve computing the characteristic polynomial and finding 
its roots. They compute approximate eigenvalues and eigenvectors directly 
by an iterative procedure. Because a Hermitian matrix has an orthogonal 
basis of eigenvectors, these methods work extremely well. 


We will not discuss these methods here, it goes beyond the scope of 
this book. However, you should believe me that there are very effective nu- 
merical methods for computing eigenvalues and eigenvectors of a Hermitian 
matrix and for finding the singular value decomposition. These methods are 
extremely effective, and just a little more computationally intensive than 
solving a linear system. 
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4.1. Image of the unit ball. Consider for example the following problem: 
let A: R” > R™ bea linear transformation, and let B = {x € R”: ||x|| < 1} 
be the closed unit ball in R”. We want to describe A(B), i.e. we want to 
find out how the unit ball is transformed under the linear transformation. 


Let us first consider the simplest case when A is a diagonal matrix A = 
diag{o1,02,...,0n}, o, > 0, k =1,2,...,n. Then for x = (x1, 22,...,an)" 
and (y1,%2,---;Yn)) = y = Ax we have y, = opv~ (equivalently, x, = 
yn /op) for k = 1,2,...,n, so 

¥ = (yt, Y2s---59n) = Ax for |x|] <1, 
if and only if the coordinates y1, y2,...,Yn satisfy the inequality 


2 2 2 nm 42 

YI Bey y mui KR <1 
2 20 ae a 
o7 pr} n bal Ok 


(this is simply the inequality ||x||? = >>) |vx|? < 1). 


The set of points in R” satisfying the above inequalities is called an el- 
lipsoid. If n = 2 it is an ellipse with half-axes o, and o2, for n = 3 it is 
an ellipsoid with half-axes 01,02 and o2. In R” the geometry of this set 
is also easy to visualize, and we call that set an ellipsoid with half axes 
01,02,-.-,;On. The vectors e1,€2,...,@n or, more precisely the correspond- 
ing lines are called the principal axes of the ellipsoid. 


The singular value decomposition essentially says that any operator in an 
inner product space is diagonal with respect to a pair of orthonormal bases, 
see Remark 3.11. Namely, consider the orthogonal bases A = vj, vo,-.--,Vn 
and B = wi, Wo,...,Wn from the singular value decomposition (3.1). Then 
the matrix of A in these bases is diagonal 


[A]g 4 = diag{on :n =1,2,...,n}. 


Assuming that all o, > 0 and essentially repeating the above reasoning, it 
is easy to show that any point y = Ax € A(B) if and only if it satisfies the 
inequality 


2 2 2 n 2 
wy, Yo, Yn _ - Yee 1 
D) T D) T T 2 >, = 
Oy. 9 or ny Ck 
where yi,¥2,---;Yn are coordinates of y in the orthonormal basis B = 
W1,W2,..-,Wn, not in the standard one. Similarly, (21, %2,...,%n)7 = [x] , 


But that is essentially the same ellipsoid as before, only “rotated” (with 
different but still orthogonal principal axes)! 
There is also an alternative explanation which is presented below. 


Consider the general case, when the matrix A is not necessarily square, 
and (or) not all singular values are non-zero. Consider first the case of a 
“diagonal” matrix Y of form (3.5). It is easy to see that the image UB of 
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the unit ball B is the ellipsoid (not in the whole space but in the Ran ¥) 
with half axes o1,02,...,0r. 


Consider now the general case, A = WUV*, where V, W are unitary 
operators. Unitary transformations do not change the unit ball (because 
they preserve norm), so V*(B) = B. We know that 4(B) is an ellipsoid in 
Ran» with half-axes o1,02,...,¢,. Unitary transformations do not change 
geometry of objects, so W(X(B)) is also an ellipsoid with the same half-axes. 
It is not hard to see from the decomposition A = WXV* (using the fact that 
both W and V* are invertible) that W transforms Ran to Ran A, so we 
can conclude: 


the image A(B) of the closed unit ball B is an ellipsoid in Ran A 
with half axes o1,02,...,0,. Here r is the number of non-zero 
singular values, i.e. the rank of A. 


4.2. Operator norm of a linear transformation. Given a linear trans- 
formation A : X — Y let us consider the following optimization problem: 
find the maximum of || Ax|| on the closed unit ball B = {x € X : ||x|| < 1}. 

Again, singular value decomposition allows us to solve the problem. For 
a “diagonal” (like the matrix © in the definition of the singular value de- 
composition) matrix A with non-negative entries the maximum is exactly 


maximal diagonal entry. Indeed, let s1, s2,...,5, be non-zero diagonal en- 
tries of A and let s; be the maximal one. Since for x = (#1, 22,...,%)" 
r 
(4.1) Ax = » SELRCK, 
k=1 


we can conclude that 
Tr 


r 
| Axl? = D7 s2legl? < 82 7 lead? = 83 «Ill? 
k=1 k=1 
so ||Ax|| < s;||x||. On the other hand, ||Ae,]|| = ||s1e1|| = s1||e1||, so indeed 
s, is the maximum of ||Ax|| on the closed unit ball B. Note, that in the 
above reasoning we did not assume that the matrix A is square; we only 
assumed that all entries outside the “main diagonal” are 0, so formula (4.1) 
holds. 


To treat the general case let us consider the singular value decompo- 
sition (3.5), A = WXV, where W, V are unitary operators, and » is the 
“diagonal” matrix with non-negative entries. Since unitary transformations 
do not change the norm, one can conclude that the maximum of ||Ax|| on 
the unit ball B is the maximal diagonal entry of © i.e. that 


the maximum of || Ax|| on the unit ball B is the maximal singular 
value of A. 
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Definition. The quantity max{|| Ax|| : x € X, ||x|| < 1} is called the oper- 
ator norm of A and denoted ||Al|. 


It is an easy exercise to see that ||A| 
1. |JevAll = Jol - All 
2. |A+ Bll < |All + [Bll 
3. ||Al] = 0 for all A; 
4. ||A|| = 0 if and only if A=0, 


satisfies all properties of the norm: 


so it is indeed a norm on a space of linear transformations from from X to 


Y. 


One of the main properties of the operator norm is the inequality 
|| Axl] < |All - Ill, 
which follows easily from the homogeneity of the norm ||x|]. 
In fact, it can be shown that the operator norm || A|| is the best (smallest) 
number C' > 0 such that 
|| Ax|| < C||x|| Vx € X. 


This is often used as a definition of the operator norm. 


On the space of linear transformations we already have one norm, the 
Frobenius, or Hilbert-Schmidt norm  ||Allo, 


|| Al|} = trace(A*A). 
So, let us investigate how these two norms compare. 


Let 51, $2,...,5, be non-zero singular values of A (counting multiplici- 
ties), and let s1 be the largest eigenvalues. Then 8}, Ss) ...,82 are non-zero 
eigenvalues of A*A (again counting multiplicities). Recalling that the trace 
equals the sum of the eigenvalues we conclude that 

r 
\| Al|3 = trace(A*A) = S sz. 
k=1 
On the other hand we know that the operator norm of A equals its largest 
singular value, i.e. || A|| = 1. So we can conclude that ||A]| < || Allo, ie. that 


the operator norm of a matrix cannot be more than its Frobenius 
norm. 


This statement also admits a direct proof using the Cauchy—Schwarz in- 
equality, and such a proof is presented in some textbooks. The beauty of 
the proof we presented here is that it does not require any computations 
and illuminates the reasons behind the inequality. 
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4.3. Condition number of a matrix. Suppose we have an invertible 
matrix A and we want to solve the equation Ax = b. The solution, of 
course, is given by x = A~‘b, but we want to investigate what happens if 
we know the data only approximately. 

That happens in the real life, when the data is obtained, for example by 
some experiments. But even if we have exact data, round-off errors during 
computations by a computer may have the same effect of distorting the data. 


Let us consider the simplest model, suppose there is a small error in the 


right side of the equation. That means, instead of the equation Ax = b we 
are solving 


Ax =b+ Ab, 
where Ab is a small perturbation of the right side b. 


So, instead of the exact solution x of Ax = b we get the approximate 
solution x+Ax of A(x+Ax) = b+Ab. We are assuming that A is invertible, 
so Ax = A7tab. 

We want to know how big is the relative error in the solution ||Ax||/||x|| 
in comparison with the relative error in the right side ||Ab||/||b]|. It is easy 
to see that 


l|axl| _ ||AT*abl] _ AT abl [bl] _ AT tab Il [Axl 
(|x| |x| I[b|| |b [bl] |x| 
Since || A~tabl| < ||A7+]]- ||Ab|] and || Ax|] < || All - |]x|| we can conclude that 


- Ab 
< a4 4) Sot 


|| Axl] 
|x| 


The quantity ||A||-||A7!|| is called the condition number of the matrix A. 
It estimates how the relative error in the solution x depends on the relative 
error in the right side b. 

Let us see how this quantity is related to singular values. Let the num- 
bers 81, $2,..., 8, be the singular values of A, and let us assume that s, is the 
largest singular value and sp is the smallest. We know that the (operator) 
norm of an operator equals its largest singular value, so 


= 1 
|Al=a, [Atl=—, 
Sn 

so 


oy ee ee 
reales eal aera 


In other words 


The condition number of a matrix equals to the ratio of the largest 
and the smallest singular values. 
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We deduced above that =. < ||A7}]| -JAIl- a. It is not hard to see 
that this estimate is sharp, i.e. that it is possible to pick the right side b 


and the error Ab such that we have equality 


jax] _ 
Il 


=i || Ab| 
ATI All pl 
We just put b = w; and Ab = awn, where wy and wy, are respectively the 
first and the last column of the matrix W in the singular value decomposition 
A= WdV*, and a ¥ 0 is an arbitrary scalar. Here, as usual, the singular 
values are assumed to be in non-increasing order s; > s2 >... > Sn, So $1 
is the largest and s,, is the smallest eigenvalue. 


We leave the details as an exercise for the reader. 


A matrix is called well conditioned if its condition number is not too big. 
If the condition number is big, the matrix is called ill conditioned. What is 
“big” here depends on the problem: with what precision you can find your 
right side, what precision is required for the solution, etc. 


4.4. Effective rank of a matrix. Theoretically, the rank of a matrix is 
easy to compute: one just needs to row reduce matrix and count pivots. 
However, in practical applications not everything is so easy. The main rea- 
son is that very often we do not know the exact matrix, we only know its 
approximation up to some precision. 

Moreover, even if we know the exact matrix, most computer programs 
introduce round-off errors in the computations, so effectively we cannot dis- 
tinguish between a zero pivot and a very small pivot. 


A simple naive idea of working with round-off errors is as follows. When 
computing the rank (and other objects related to it, like column space, 
kernel, etc) one simply sets up a tolerance (some small number) and if the 
pivot is smaller than the tolerance, count it as zero. The advantage of 
this approach is its simplicity, since it is very easy to program. However, 
the main disadvantage is that is is impossible to see what the tolerance is 
responsible for. For example, what do we lose is we set the tolerance equal 
to 10-°? How much better will 10~® be? 


While the above approach works well for well conditioned matrices, it is 
not very reliable in the general case. 


A better approach is to use singular values. It requires more computa- 
tions, but gives much better results, which are easier to interpret. In this 
approach we also set up some small number as a tolerance, and then per- 
form singular value decomposition. Then we simply treat singular values 
smaller than the tolerance as zero. The advantage of this approach is that 
we can see what we are doing. The singular values are the half-axes of the 
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ellipsoid A(B) (B is the closed unit ball), so by setting up the tolerance we 
just deciding how “thin” the ellipsoid should be to be considered “flat”. 


4.5. Moore—Penrose (pseudo)inverse. As we discussed in Section 4 of 
Chapter 5 above, the least square solution gives us, in the case when an 
equation Ax = b does not have a solutions, the “next best thing” (and 
gives us the solution of Ax = b when it exists). 


Note, that the question of uniqueness is not addressed by the least square 
solution: a solution of the normal equation A* Ax = A*b does not have to 
be unique. A natural distinguished solution would be a solution of minimal 
norm; such a solution is indeed unique, and can be obtained by taking 
an arbitrary solution and then taking its projection onto (Ker A*A)+ = 
(Ker A)+, see problems 4.5 and 4.6 in Chapter 5. 


It is not hard to see that if A = WEV* is a reduced singular value 
decomposition of A, then the minimal norm least square solution xo is given 
by 
(4.2) xo = VE"! W*b. 


Indeed, xo is a least square solution of Ax = b (i.e. a solution of Ax = 


Pran A b) ; 


Axy = WEVV*S Wb = WEE W*b = WW*b = PL, 


in the last equality in the chain we used the fact that ww = L 7 
van 


an aa W(W*W)-!W* = WW*) and that RanW = Ran A (see Prob- 
lem 4.4 below). 


The general solution of Ax = Pain ab is given by 


Xx=xoty, y € Ker A, 


b; 


so Xo is indeed a unique minimal norm solution of Ax = P b, or equiv- 
RanA~? 


alently, the minimal norm least square solution of Ax = b. 


Definition 4.1. The operator At := vs W*, where A = WEV* is a 
reduced singular value decomposition of A, is called the Moore—Penrose in- 
verse (or Moore—Penrose pseudoinverse) of the operator A. In other words, 
the Moore—Penrose inverse is the operator giving the unique least square 
solution of Ax = b. 


Remark 4.2. In the literature the Moore—Penrose inverse is usually defined 
as a matrix At such that 


1. AAtA=A; 
9. AtAAt = At; 
3. (AAt)* = AAt; 
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4. (AtA)* = ATA, 


It is very easy to check that the operator AT := V=-!W* satisfies 
properties 1—4 above. 


It is also possible (although a bit harder) to show that an operator 
At satisfying properties 1-4 is unique. Indeed, right and left multiplying 
identity 1 by A*, we get that (A+ A)? = A*A and (AAt)? = AA; together 
with properties 3 and 4 this means that AtA and AA* ore orthogonal 
projections (see Problem 5.6 in Capter 5). 


Trivially, Ker A C Ker At A. On the other hand, identity 1 implies that 
Ker ATA C Ker A (why?), so Ker At A = Ker A. But this means that ATA 
is the orthogonal projection onto (Ker A)+ = Ran A*, 

+4 
An= Pran AX 7 

Property 1 also implies that AAty = y for all y € Ran A. Since AAT 
is an orthogonal projection, we conclude that RanA C Ran AA*t. The 
opposite inclusion Ran AAt C Ran A is trivial, so AAt is the orthogonal 
projection onto Ran A, 


+ 
AAT = Pha: 


Knowing AtA and AA?t we can rewrite property 2 as 


+ _ At + — At 
Pran ard =A Ob A Pran A =A 
Combining the above identities we get 
+ — At 
Pran aA Pik a, 


Finally, for any b in the target space of A 


xp := ATb= Pee € Ran A* 


and 


Axy = AAtb = P, 4b, 


i.e. Xo is a least square solution of Ax = b. Since xo € Ran A* = (Ker A)~, 
Xo is, as we discussed above, the least square solution of minimal norm. But, 
as we had shown before, such minimal norm solution is given by (4.2), so 
At =VE1W*. 


Exercises. 


4.1. Find norms and condition numbers for the following matrices: 
4 0 
a) A= ( 1 3 ): 
5 3 
yan(§ 3). 
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For the matrix A from part a) present an example of the right side b and the 
error Ab such that 


|Ax| 1, WABI, 
Ty = All AT: 
I|x| [bl ’ 
here Ax = b and AAx = Ab. 
4.2. Let A be a normal operator, and let Ai, A2,..., An be its eigenvalues (counting 
multiplicities). Show that singular values of A are Daal |A2|,--+5 |An|- 


4.3. Find singular values, norm and condition number of the matrix 
21 41 
A= 12 1 
1 1 2 
You can do this problem practically without any computations, if you use the 


previous problem and can answer the following questions: 


a) What are singular values (eigenvalues) of an orthogonal projection Pg onto 
some subspace EF? 


b) What is the matrix of the orthogonal projection onto the subspace spanned 
by the vector (1,1,1)7? 


c) How the eigenvalues of the operators T and aT + bI, where a and b are 
scalars, are related? 


Of course, you can also just honestly do the computations. 


4.4. Let A= Wxv* be a reduced singular value decomposition of A. Show that 
Ran A = RanW, and then by taking adjoint that Ran A* = RanV. 


4.5. Write a formula for the Moore—Penrose inverse At in terms of the singular 
value decomposition A = WXV*. 


4.6. Tychonov’s regularization: Prove that the Moore—Penrose inverse A+ can be 
computed as the limits 


At= im 1 (A" A+el)1A* = lim A*(AA* +eI)71 
e30+ 


5. Structure of orthogonal matrices 


An orthogonal matrix U with detU = 1 is often called a rotation. The 
theorem below explains this name. 


Theorem 5.1. Let U be an orthogonal operator in R” and let detU = 1.7 
Then there exists an orthonormal basis v1,V2,...,Vn such that the matrix 


2For an orthogonal matrix U det U = +1. 
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of U in this basis has the block diagonal form 


Ro, 0 


Roy 


q) Ro, 
where Ry, are 2-dimensional rotations, 
cosy, — Sin YE 
Rg, = : 
SIN YE COS Yr 


and In_2, stands for the identity matrix of size (n — 2k) x (n — 2k). 


Proof. We know that if p is a polynomial with real coefficient and 4 is its 
complex root, p(A) = 0, then \ is a root of p as well, p(A) = 0 (this can 
easily be checked by plugging \ into p(z) = 79 anz*). 

Therefore, all complex eigenvalues of a real matrix A can be split into 
pairs Xk; Xk: 

We know, that eigenvalues of a unitary matrix have absolute value 1, 
so all complex eigenvalues of A can be written as A, = cosagz + isinag, 
Ap = COSQp — isin ar. 

Fix a pair of complex eigenvalues \ and 4, and let u € C” be the 
eigenvector of U, Uu = Au. Then Ut = At. Now, split u into real and 
imaginary parts, i.e. define 


x := Reu=(u+u)/2, y = Imu = (u—- 0)/(2%), 


sou = x+iy (note, that x,y are real vectors, i.e. vectors with real entries). 
Then 


1 1 = 
Ux= U5(u +U) = 5 (Au + Ad) = Re(Au). 
Similarly, 
5 U(u a) = rF, (Au — At) = Im(Au). 
Since \ = cosa +isina, we have 


Au = (cosa+isina)(x+iy) = ((cosa)x— (sina)y) +i((cos a)y + (sin a)x). 
Ux = Re(Au) = (cos a)x—(sina)y, Uy = Im(Au) = (cosa)y+(sina)x. 


In other word, U leaves the 2-dimensional subspace EF, spanned by the vec- 
tors x, y invariant and the matrix of the restriction of U onto this subspace 
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is the rotation matrix 
cosa@ sina 
Age : ; 
—sina cosa 


Note, that the vectors u and U (eigenvectors of a unitary matrix, cor- 
responding to different eigenvalues) are orthogonal, so by the Pythagorean 
Theorem 

J2 


2 
lel = Iyll = lla. 


It is easy to check that x L y, so x,y is an orthogonal basis in Fy. If we 
multiply each vector in the basis x, y by the same non-zero number, we do 
not change matrices of linear transformations, so without loss of generality 
we can assume that ||x|| = |ly|| = 1 ie. that x,y is an orthogonal basis in 
E). 

Let us complete the orthonormal system v, = x, v2 = y to an orthonor- 
mal basis in R”. Since UE) C Ej, i.e. Fy is an invariant subspace of U, the 
matrix of U in this basis has the block triangular form 


Ree * 


where O stands for the (n — 2) x 2 block of zeroes. 


Since the rotation matrix R_, is invertible, we have UE) = EF). There- 
fore 


U*E) =U7'E) = Ey, 
so the matrix of U in the basis we constructed is in fact block diagonal, 


Ro 0 


Since U is unitary 


Usp ||? 


so, since U; is square, it is also unitary. 
If U; has complex eigenvalues we can apply the same procedure to de- 


crease its size by 2 until we are left with a block that has only real eigenval- 
ues. Real eigenvalues can be only +1 or —1, so in some orthonormal basis 
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the matrix of U has the form 


here J, and J; are identity matrices of size r x r and [ x | respectively. Since 
det U = 1, the multiplicity of the eigenvalue —1 (i.e. r) must be even. 


Note, that the 2 x 2 matrix —J2 can be interpreted as the rotation 
through the angle 7. Therefore, the above matrix has the form given in the 


conclusion of the theorem with y, = —axz or yr = 7 


Let us give a different interpretation of Theorem 5.1. Define T; to be a 
rotation thorough y; in the plane spanned by the vectors vj, vj41. Then 
Theorem 5.1 simply says that U is the composition of the rotations Tj, 7 = 
1,2,...,k. Note, that because the rotations T; act in mutually orthogonal 
planes, they commute, i.e. it does not matter in what order we take the 
composition. So, the theorem can be interpreted as follows: 


Any rotation in R” can be represented as a composition of at most 
n/2 commuting planar rotations. 


If an orthogonal matrix has determinant —1, its structure is described 
by the following theorem. 


Theorem 5.2. Let U be an orthogonal operator in R", and let dettU = 
—1.Then there exists an orthonormal basis v1, V2,...,Vn such that the ma- 
trig of U in this basis has block diagonal form 


Ro 0 


Ro» 


I, 
0 =i 


where r =n —2k—1 and Ry, are 2-dimensional rotations, 


R,. ={ 8% — sin Yr 
Yk \ sinwp COS YE 


and Inox stands for the identity matrix of size (n — 2k) x (n — 2k). 
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We leave the proof as an exercise for the reader. The modification that 
one should make to the proof of Theorem 5.1 are pretty obvious. 

Note, that it follows from the above theorem that an orthogonal 2 x 2 
matrix U with determinant —1 is always a reflection. 


Let us now fix an orthonormal basis, say the standard basis in R”. 
We call an elementary rotation? a rotation in the xj-X£_ plane, i.e. a linear 
transformation which changes only the coordinates x; and a,x, and it acts 
on these two coordinates as a plane rotation. 


Theorem 5.3. Any rotation U (i.e. an orthogonal transformation U with 
det U = 1) can be represented as a product at most n(n — 1)/2 elementary 
rotations. 


To prove the theorem we will need the following simple lemmas. 
Lemma 5.4. Let x = (a1, 22)" € R?. There exists a rotation Ra of R? 
which moves the vector x to the vector (a,0)?, where a = \/at + 23. 


The proof is elementary, and we leave it as an exercise for the reader. 
One can just draw a picture or/and write a formula for Ra. 


Lemma 5.5. Let x = (a1, %2,...,2n)’ € R”. There exist n —1 elementary 
rotations Ri, Ro,...,Rn—1 such that Rp1...,R2Rix = (a,0,0,...,0)7, 
where a = \/x7 +a34+...4+-22. 


Proof. The idea of the proof of the lemma is very simple. We use an 
elementary rotation R, in the zn_1-%» plane to “kill” the last coordinate of 
x (Lemma 5.4 guarantees that such rotation exists). Then use an elementary 
rotation Rp in %p_2-%,_1 plane to “kill” the coordinate number n—1 of R x 
(the rotation R does not change the last coordinate, so the last coordinate 
of R2R,x remains zero), and so on... 


For a formal proof we will use induction in n. The case n = 1 is trivial, 
since any vector in R! has the desired form. The case n = 2 is treated by 
Lemma 5.4. 


Assuming now that Lemma is true for n — 1, let us prove it for n. By 
Lemma 5.4 there exists a 2 x 2 rotation matrix Rg such that 


In-1 = an-1 
7 ( tn ) - ( 0 ) , 
2 


where G@n—1 = \/@5_4 + x2. So if we define the n x n elementary rotation 


R, by 
i= 0 
ra = ( . R ) 


3This term is not widely accepted. 
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(In—2 is (n — 2) x (n — 2) identity matrix), then 
Rix = (x1, BQ,+++5Un—2,An-1, o)?. 


We assumed that the conclusion of the lemma holds for n — 1, so there 
exist n — 2 elementary rotations (let us call them Ro, R3,...,Rn—1) in R"-! 
which transform the vector (21,%2,.-.,%n—1,@n-1)’ € R”~! to the vector 
(a,0,...,0)7 € R"-!. In other words 


Ry-1 see R3Ro(x1, ZQ,-++5Un—-1; Gn—1)" = (a, 0, auaie 0)", 


We can always assume that the elementary rotations Ro, R3,...,Rn—1 
act in R”, simply by assuming that they do not change the last coordinate. 
Then 

Ry-1....R3R2R1x = (a,0,...,0)7 ER”. 


Let us now show that a = \/xi+a23+...+22%. It can be easily checked 


directly, but we apply the following indirect reasoning. We know that or- 
thogonal transformations preserve the norm, and we know that a > 0. 
But, then we do not have any choice, the only possibility for a is a = 


Vartagt+... +22. 


Lemma 5.6. Let A be an n xX n matrix with real entries. There exist el- 
ementary rotations Ry, Ro,...,Rn, N < n(n—1)/2 such that the matrix 
B=Ry...R2RiA is upper triangular, and, moreover, all its diagonal en- 
tries except the last one Bnn are non-negative. 


Proof. We will use induction in n. The case n = 1 is trivial, since we can 
say that any 1 x 1 matrix is of desired form. 


Let us consider the case n = 2. Let a; be the first column of A. By 
Lemma 5.4 there exists a rotation R which “kills” the second coordinate of 
a;, making the first coordinate non-negative. Then the matrix B = RA is 
of desired form. 


Let us now assume that lemma holds for (n — 1) x (n — 1) matrices, 
and we want to prove it for n x n matrices. For the n x n matrix A let a; 
be its first column. By Lemma 5.5 we can find n — 1 elementary rotations 
(say Ri, Ro,...,Rn—1 which transform aj into (a,0,...,0)". So, the matrix 
Rn-1...R2R,A has the following block triangular form 


Ki. eA = a i J 


where A, is an (n — 1) x (n — 1) block. 
We assumed that lemma holds for n—1, so A; can be transformed by at 


most (n—1)(n—2)/2 rotations into the desired upper triangular form. Note, 
that these rotations act in R"—! (only on the coordinates 22, 73,..., pn), but 
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we can always assume that they act on the whole R” simply by assuming 
that they do not change the first coordinate. Then, these rotations do not 
change the vector (a,0,...,0)" (the first column of R,_1...R2R,A), so the 
matrix A can be transformed into the desired upper triangular form by at 
most n — 1+ (n —1)(n — 2)/2 = n(n — 1)/2 elementary rotations. 


Proof of Theorem 5.3. By Lemma 5.5 there exist elementary rotations 
Ri, Ro,..., Rw such that the matrix Uj = Ry ... RoReU is upper triangular, 
and all diagonal entries, except maybe the last one, are non-negative. 


Note, that the matrix U; is orthogonal. Any orthogonal matrix is nor- 
mal, and we know that an upper triangular matrix can be normal only if it 
is diagonal. Therefore, U; is a diagonal matrix. 


We know that an eigenvalue of an orthogonal matrix can either be 1 or 
—1, so we can have only 1 or —1 on the diagonal of U;. But, we know that 
all diagonal entries of Ui, except may be the last one, are non-negative, so 
all the diagonal entries of U,, except may be the last one, are 1. The last 
diagonal entry can be +1. 


Since elementary rotations have determinant 1, we can conclude that 
det U; = det U = 1, so the last diagonal entry also must be 1. So U; = J, 
and therefore U can be represented as a product of elementary rotations 
U= RyRy Hes ae. Here we use the fact that the inverse of an elementary 
rotation is an elementary rotation as well. 


6. Orientation 


6.1. Motivation. In Figures 1, 2 below we see 3 orthonormal bases in R? 
and R? respectively. In each figure, the basis b) can be obtained from the 
standard basis a) by a rotation, while it is impossible to rotate the standard 
basis to get the basis c) (so that e;, goes to vz Vk). 


You have probably heard the word “orientation” before, and you prob- 
ably know that bases a) and b) have positive orientation, and orientation of 
the bases c) is negative. You also probably know some rules to determine 


e€2 
V2 V1 


Vi 


a) b) ¢) 


Figure 1. Orientation in R? 


194 6. Structure of operators in inner product spaces. 


the orientation, like the right hand rule from physics. So, if you can see a 
basis, say in R?, you probably can say what orientation it has. 


But what if you only given coordinates of the vectors v1, v2,v3? Of 
course, you can try to draw a picture to visualize the vectors, and then to 
see what the orientation is. But this is not always easy. Moreover, how do 
you “explain” this to a computer? 

It turns out that there is an easier way. Let us explain it. We need to 
check whether it is possible to get a basis v1, V2, v3 in R? by rotating the 
standard basis e1,e2,e,. There is unique linear transformation U such that 


Vex = Vk, k= 1, 2,3; 


its matrix (in the standard basis) is the matrix with columns vj, va, v3. It 
is an orthogonal matrix (because it transforms an orthonormal basis to an 
orthonormal basis), so we need to see when it is rotation. Theorems 5.1 and 
5.2 give us the answer: the matrix U is a rotation if and only if detU = 1. 
Note, that (for 3 x 3 matrices) if det U = —1, then U is the composition of 
a rotation about some axis and a reflection in the plane of rotation, i.e. in 
the plane orthogonal to this axis. 


This gives us a motivation for the formal definition below. 


6.2. Formal definition. Let A and B be two bases in a real vector space 
X. We say that the bases A and B have the same orientation, if the change 


of coordinates matrix [I], , has positive determinant, and say that they 


have different orientations if the determinant of [I], , is negative. 
Note, that since [J] 4.8 = haere one can use the matrix [J]4,g in the 
definition. 


We usually assume that the standard basis e1,e2,...,@n in R” has pos- 
itive orientation. In an abstract space one just needs to fix a basis and 
declare that its orientation is positive. 


Vv. 
e3 3 V3 


v2 
ey 


Figure 2. Orientation in R? 
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If an orthonormal basis v1,v2,...,Vn in R” has positive orientation 
(i.e. the same orientation as the standard basis) Theorems 5.1 and 5.2 say 
that the basis v1, v2,...,V, is obtained from the standard basis by a rota- 
tion. 


6.3. Continuous transformations of bases and orientation. 


Definition. We say that a basis A = {a),a2,...,a,} can be continuously 
transformed to a basis B = {b,b2,...,b,} if there exists a continuous 
family of bases V(t) = {vi(t), va(t),.--,vn(t)}, t € [a, 6] such that 


vz(a) = ag, vz(b) = by, | ell PE 


“Continuous family of bases” mean that the vector-functions v;,(t) are con- 
tinuous (their coordinates in some bases are continuous functions) and, 
which is essential, the system vi (t), va(t),..., Vn(t) is a basis for all t € [a, 0]. 


Note, that performing a change of variables, we can always assume, if 
necessary that [a, b] = [0, 1). 


Theorem 6.1. Two bases A = {aj,a,...,a,} and B = {by,bo,..., by} 
have the same orientation, if and only if one of the bases can be continuously 
transformed to the other. 


Proof. Suppose the basis A can be continuously transformed to the basis 
B, and let V(t), t € [a,b] be a continuous family of bases, performing this 
transformation. Consider a matrix-function V(t) whose columns are the 
coordinate vectors [v;,(t)].4 of v,(t) in the basis A. 


Clearly, the entries of V(t) are continuous functions and V(a) = J, 
V(b) = [].4,8. Note, that because V(t) is always a basis, det V(¢) is never 
zero. Then, the Intermediate Value Theorem asserts that det V(a) and 
det V(b) has the same sign. Since det V(a) = det J = 1, we can conclude 
that 

det[I].4,8 = det V(b) > 0, 
so the bases A and B have the same orientation. 


To prove the opposite implication, i.e. the “only if” part of the theorem, 
one needs to show that the identity matrix J can be continuously trans- 
formed through invertible matrices to any matrix B satisfying det B > 0. 
In other words, that there exists a continuous matrix-function V(t) on an 
interval [a, 6] such that for all ¢ € [a, b] the matrix V(t) is invertible and such 
that 

V(a) = 7, V(b) = B. 
We leave the proof of this fact as an exercise for the reader. There are several 
ways to prove that, on of which is outlined in Problems 6.2—6.5 below. 
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Exercises. 


6.1. Let Ry be the rotation through a, so its matrix in the standard basis is 
cosa —sina 
sin @ cosa })* 

Find the matrix of Rg in the basis v;,v2, where v1 = e2, v2 = e1. 


6.2. Let R, be the rotation matrix 
Ra = ( cosa —sina ) 
sina cosa 
Show that the 2 x 2 identity matrix Jj can be continuously transformed through 
invertible matrices into Ry. 


6.3. Let U be an n x n orthogonal matrix, and let det U > 0. Show that the n x n 
identity matrix J,, can be continuously transformed through invertible matrices into 
U. Hint: Use the previous problem and representation of a rotation in R” as a 
product of planar rotations, see Section 5. 


6.4. Let A be an n x n positive definite Hermitian matrix, A = A* > 0. Show that 
the n x n identity matrix [,, can be continuously transformed through invertible 
matrices into A. Hint: What about diagonal matrices? 


6.5. Using polar decomposition and Problems 6.3, 6.4 above, complete the proof 
of the “only if” part of Theorem 6.3 


Chapter 7 


Bilinear and quadratic 
forms 


While the study of real quadratic forms (i.e. real homogeneous polynomials 
of degree 2) was probably the initial motivation for the subject of this chap- 
ter, complex quadratic forms (Ax, x), x € C”, A = A* are also of significant 
interest. So, unless otherwise specified, result and calculations hold in both 
real and complex case. 


To avoid writing twice essentially the same formulas, we use the notation 
adapted to the complex case: in particular, in the real case the notation A* 
is used instead of A’. 


1. Main definition 


1.1. Bilinear forms on R”. A bilinear form on R” is a function D = 
L(x, y) of two arguments x, y € R” which is linear in each argument, i.e. such 
that 


1. Lax, + Bx2, y) = aL(x, y) Tv BL(x2,y); 
2. L(x, ay1 + By2) = aL(x,y1) + BL(x, y2). 
One can consider bilinear form whose values belong to an arbitrary vector 


space, but in this book we only consider forms that take real values. 
yr 


If x = (#1, 22,...,2p)? and y = (y1, y2,---, Yn)", a bilinear form can be 


written as 


n 
Lx y) = So ajnreys, 
aR 
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or in matrix form 


L(x, y) = (Ax, y) 


where 
a11 a1,2 aes ain 
a2,1 a2,2 uote arn 
A= 
Gn Gn2 +-. Ann 


The matrix A is determined uniquely by the bilinear form L. 


1.2. Quadratic forms on R”. There are several equivalent definition of 
a quadratic form. 

One can say that a quadratic form on R” is the “diagonal” of a bilinear 
form L, ie. that any quadratic form Q is defined by Q[x] = L(x,x) = 
(Ax, x). 

Another, more algebraic way, is to say that a quadratic form is a homo- 
geneous polynomial of degree 2, i.e. that Q[x] is a polynomial of n variables 
%1,22,...,2%py having only terms of degree 2. That means that only terms 
ax? and car;xp are allowed. 

There many ways (in fact, infinitely many) to write a quadratic form 
Q[x] as Q[x] = (Ax,x). For example, the quadratic form Q[x] = x? + 
x3 — 4212 on R? can be represented as (Ax, x) where A can be any of the 


matrices 
1 —4 1 0 1 -2 
0 1}? —-4 1)’ —2 1} 


In fact, any matrix A of form 


will work. 


But if we require the matrix A to be symmetric, then such a matrix is 
unique: 


Any quadratic form Q[x] on R” admits unique representation 
Q |x] = (Ax,x) where A is a (real) symmetric matrix. 


For example, for the quadratic form 


Q{x] =x} - 3x3 - baz + 4x9 — 1627123 + Tro73 
on R?, the corresponding symmetric matrix A is 
1 2 -8 
2 3 3.5 
—-8 3.5 5 
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1.3. Quadratic forms on C”. One can also define a quadratic form on C” 
(or any complex inner product space) by taking a self-adjoint transformation 
A = A* and defining Q by Q[x] = (Ax,x). While our main examples will 
be in R”, all the theorems are true in the setting of C” as well. Bearing this 
in mind, we will always use A* instead of AT 


The only essential difference with the real case is that in the complex 
case we do not have any freedom of choice: if the quadratic form is real, the 
corresponding matrix has to be Hermitian (self-adjoint). 


Note that if A = A* then 
(Ax, x) = (x, Ax) = (x, Ax) = (Ax,x), 
so (Ax,x) ER. 
The converse is also true. 


Lemma 1.1. Let (Ax,x) be real for allx € C”. Then A= A*. 


We leave the proof as an exercise for the reader, see Problem 1.4 below. 


One of the possible ways to prove Lemma 1.1 is to use the following 
version of polarization identities. 


Lemma 1.2. Let A be an operator in an inner product space X. 


1. If X is a complex space, then for any x,y © X 


(Ax,y) = 5 Se a(A(x + ay),x + ay). 


a€C:at=1 
2. If X is a real space and A = A*, then any x,y © X 


(Ax,y) = 5 (AG ty), x+y) - (Ale -y),x-y)]. 


For the proof of Lemma 1.2 see Exercise 6.3 in Chapter 5 above. 


Exercises. 
1.1. Find the matrix of the bilinear form L on R°, 
L(x, y) = e1y1 + 2aiy2 + 14x y3 — Saoy1 + 2roy2 — 3x2y3 + 8a3y1 + 19rgy2— 2xzy3. 
1.2. Define the bilinear form L on R? by 
L(x, y) = det|x, y], 


i.e. to compute L(x, y) we form a 2 x 2 matrix with columns x, y and compute its 
determinant. 


Find the matrix of L. 
1.3. Find the matrix of the quadratic form Q on R® 


Q[x] = x + 221% — 32123 —- 9x3 + 6x9%3 + 1322. 
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1.4. Prove Lemma 1.1 above. 


Hint: Use the polarization identity, see Lemma 1.2. Alternatively, you can 
consider the expression (A(x + zy),x+ zy) and show that if it is real for all z € C 


then (Ax, y) = (y, A*x). 


2. Diagonalization of quadratic forms 


You have probably met quadratic forms before, when you studied second 
order curves in the plane. Maybe you even studied the second order surfaces 
in R®. 

We want to present a unified approach to classification of such objects. 
Suppose we are given a set in R” defined by the equation Q[x] = 1, where 
Q is some quadratic form. If Q has some simple form, for example if the 
corresponding matrix is diagonal, i.e. if Q[z] = ayr} + agar} +...+@n22, we 
can easily visualize this set, especially if m = 2,3. In higher dimensions, it 
is also possible, if not to visualize, then to understand the structure of the 
set very well. 


So, if we are given a general, complicated quadratic form, we want to 
simplify it as much as possible, for example to make it diagonal. The stan- 
dard way of doing that is the change of variables. 


2.1. Orthogonal diagonalization. Let us have a quadratic form Q[x] = 
(Ax, x) in F” (F is R or C). Introduce new variables y = (y1, y2,---;Yn)’ € 
F", with y = S~'x, where S$ is some invertible n x n matrix, so x = Sy. 


Then, 
Q[x] = Q[Sy] = (ASy, Sy) = (S*ASy,y), 


so in the new variables y the quadratic form has matrix S*AS. 


So, we want to find an invertible matrix S such that the matrix S*AS 
is diagonal. Note, that it is different from the diagonalization of matrices 
we had before: we tried to represent a matrix A as A = SDS™!, so the 
matrix D = S~!AS was diagonal. However, for unitary matrices U, we 
have U* = U~!, and we can orthogonally diagonalize symmetric matrices. 
So we can apply the orthogonal diagonalization we studied before to the 
quadratic forms. 


Namely, we can represent the matrix A as A = UDU* = UDU™!. 
Recall, that D is a diagonal matrix with eigenvalues of A on the diagonal, 
and U is the matrix of eigenvectors (we need to pick an orthogonal basis of 
eigenvectors). Then D = U* AU, so in the variables y = U~!x the quadratic 
form has diagonal matrix. 
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Let us analyze the geometric meaning of the orthogonal diagonaliza- 
tion. The columns uj, U2,...,U, of the unitary matrix U form an orthonor- 
mal basis in F”, let us call this basis B. The change of coordinate matrix 
[I] s,g {rom this basis to the standard one is exactly U. We know that 
y = (1, Y2;---;Yn)’ = U-!x, so the coordinates yz, y2,..-; Yn can be inter- 
preted as coordinates of the vector x in the new basis uj, U2,...,Un. 


So, orthogonal diagonalization allows us to visualize very well the set 
Q|x] = 1, or a similar one, as long as we can visualize it for diagonal matrices. 


Example. Consider the quadratic form of two variables (i.e. quadratic form 
on R?), Q(x, y) = 2x? + 2y?+2ry. Let us describe the set of points (x,y)? € 
R? satisfying Q(x,y) = 1. 
2 1 
qe ( ) 


The matrix A of Q is 
Orthogonally diagonalizing this matrix we can represent it as 


3 0 7 oe! 1 -1 
a=u(} 9 }e where v=3(5 o) 


or, equivalently 
* 3.0 
U* AU = =: D. 


The set {y : (Dy, y) = 1} is the ellipse with half-axes 1//3 and 1. There- 
fore the set {x € R? : (Ax,x) = 1}, is the same ellipse only in the basis 
(J Bs ( Te By or, equivalently, the same ellipse, rotated 7/4. 

2.2. Non-orthogonal diagonalization. Orthogonal diagonalization in- 
volves computing eigenvalues and eigenvectors, so it may be difficult to do 
without computers for large n. On the other hand, the non-orthogonal di- 
agonalization, i.e. finding an invertible S$ (without requiring S~! = $*) such 
that D = S*AS is diagonal, is much easier computationally and can be 
done using only algebraic operations (addition, subtraction, multiplication, 
division). 


Below we present two most used methods of non-orthogonal diagonal- 
ization. 


2.2.1. Diagonalization by completion of squares. The first methods is based 
on completion of squares. We will illustrate this method on real quadratic 
forms (forms on R"). After simple modifications this method could be used 
in the complex case, but we will not discuss it here. If necessary, an inter- 
ested reader should be able to to make the appropriate modifications. 


Let us again consider the quadratic form of two variables, Q[x] = 2a7 + 
2x1 x22 + 2x2 (it is the same quadratic form as in the above example, only 
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here we call variables not x,y but 21,22). Since 


ae te 1 1 1 
2 (2 | 5) =2(24 | 251 502 | 23) = 2x? + 24429 4 522 


(note, that the first two terms coincide with the first two terms of Q), we 
get 


Pet NS 3 
2x2 + 201 X4 2} =2 (1 5) | 582 = Yi | 5 
where yj = 41 + 5X2 and yg = £2. 
The same method can be applied to quadratic form of more than 2 
variables. Let us consider, for example, a form Q[x] in R?, 


Q{x] =27 62122 + 42123 — 62273 4 8x3 3x3. 


Considering all terms involving the first variable x1 (the first 3 terms in this 
case), let us pick a full square or a multiple of a full square which has exactly 
these terms (plus some other terms). 


Since 


(a1, — 3x + 2x3)” =a} 627129 + 421273 — 122973 4 9x3 Ax? 


we can rewrite the quadratic form as 


(a1 — 3aq + 2x3)? — 2} + Gaax3 — 7x3. 


Note, that the expression a3 + 62%2%3 — 7x3 involves only variables x2 and 
x3. Since 


(ag — 323)? = —(a3 — 6rzar3 + 9a?) = —23 + 6ror3 — 9x3 


we have 


x3 + 62203 — 7x3 = —(x2 — 3x3)? + 2%. 
Thus we can write the quadratic form Q as 
Q[x] = (a1 — 3a2 + 223)? — (a2 — 323)? 4 2x3 = ye —ye4 2y3 


where 


Yi = 11 — 342 + 273, Yy2 = £2 — 373, Y3 = @3. 


Finally, let us address the question that an attentive reader is probably 
already asking: what to do if at some point we do have a product of two 
variables, but no corresponding squares? For example, how to diagonalize 
the form 21x72? The answer follows immediately from the identity 


(2.1) 401,22 = (v1 { x2)" (v1 x2)’, 
which gives us the representation 


QIx] = yi — 93, yi = (a1 + #2)/2, yo = (1 — 22)/2. 
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2.2.2. Diagonalization using row/column operations. There is another way 
of performing non-orthogonal diagonalization of a quadratic form. The idea 
is to perform row operations on the matrix A of the quadratic form. The 
difference with the row reduction (Gauss—Jordan elimination) is that after 
each row operation we need to perform the same column operation, the 
reason for that being that we want to make the matrix S*AS diagonal. 


Let us explain how everything works on an example. Suppose we want 
to diagonalize a quadratic form with matrix 


1 -1 3 
A=]{]-1 2 1 
3 11 
We augment the matrix A by the identity matrix, and perform on the aug- 
mented matrix (A|/) row/column operations. After each row operation we 
have to perform on the matrix A the same column operation.! We get 
1 -1 3/1 00 1 -1 3/1 00 
-1 2 1/0 10 J4+R; ~f| O 1 44/1 1 04x" 
3 1 1)/0 01 3 11/001 
1 0 3/1 0 0 10 3; 10 0 
01 4/)1 1 °0 ~1o01 4) 1104~ 
3 4 1/0 0 1/-3R, 0 4 -8)-3 0 1 
10 0; 10 0 1 0 0; 1 O00 
O01 4) 1 1 °0 ~{1 01 4) 1 10 4]~ 
0 4 -8])-3 0 1 —4R, 0 0 -—24)-7 -4 1 
1 0 0; 1 OO 
0 1 0; 1 10 
0 0 -—24;-7 -4 1 


Note, that we perform column operations only on the left side of the aug- 
mented matrix 

We get the diagonal D matrix on the left, and the matrix S* on the 
right, so D= S*AS, 


1 0 0 1 0 0 1 -1 3 1 1 -7 
01 Oo };= 1 1 0 -1 21 0 1 -4 
0 0 —24 —7 -4 1 3 1 i 00 1 


Let us explain why the method works. A row operation is a left multipli- 
cation by an elementary matrix. The corresponding column operation is 
the right multiplication by the transposed elementary matrix. Therefore, 


lin the case of complex Hermitian matrices we perform for each row operation the conjugate 
o the corresponding column operation, see Remark 2.1 below 
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performing row operations F, F2,...,£y and the same column operations 
we transform the matrix A to 
(2.2) En... FE, BE, AE}E3...EN = EAE*. 


As for the identity matrix in the right side, we performed only row operations 
on it, so the identity matrix is transformed to 


Ey...LoR\I = EI=E. 


If we now denote E* = S' we get that S*AS' is a diagonal matrix, and the 
matrix E = S* is the right half of the transformed augmented matrix. 


In the above example we got lucky, because we did not need to inter- 
change two rows. This operation is a bit tricker to perform. It is quite 
simple if you know what to do, but it may be hard to guess the correct row 
operations. Let us consider, for example, a quadratic form with the matrix 


4=(1 0) 


If we want to diagonalize it by row and column operations, the simplest 
idea would be to interchange rows 1 and 2. But we also must to perform 
the same column operation, i.e. interchange columns 1 and 2, so we will end 
up with the same matrix. 


So, we need something more non-trivial. The identity (2.1), for example, 
can be used to diagonalize this quadratic form. However, a simpler idea also 
works: use row operations to get a non-zero entry on the diagonal! For 
example, if we start with making a1,; non-zero, the following series of row 
(and the corresponding column) operations is one of the possible choices: 


0 1/1 0 \+5R2 1/2 1/1 1/2 
1 0/0 1 1 0/0 1 
1 1}1 1/2 1 1] 1 1/2 
10/0 1 —R, 0 -1)/-1 1/2 
1 OO; 1 1/2 
0 -1}-1 1/2 /° 
Remark. Non-orthogonal diagonalization gives us a simple description of 
a set Q[x] = 1 in a non-orthogonal basis. It is harder to visualize, than the 
representation given by the orthogonal diagonalization. However, if we are 
not interested in the details, for example if it is sufficient for us just to know 


that the set is an ellipsoid (or hyperboloid, etc), then the non-orthogonal 
diagonalization is an easier way to get the answer. 


Remark 2.1. For quadratic forms with complex entries (i.e. for forms 
(Ax,x), A = A*), the non-orthogonal diagonalization works the same way 
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as in the real case, with the only difference, that the corresponding “column 
operations” have the complex conjugate coefficients. 


The reason for that is that if a row operation is given by left multiplica- 
tion by an elementary matrix E;, then the corresponding column operation 
is given by the right multiplication by Ej, see (2.2). 

Note that formula (2.2) works in both complex and reals cases: in real 
case we could write ET instead of Ef, but using Hermitian adjoint allows 
us to have the same formula in both cases. 
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Exercises. 


2.1. Diagonalize the quadratic form with the matrix 


12 1 
A=] 2 3 2 
12 1 


Use two methods: completion of squares and row operations. Which one do you 
like better? 


Can you say if the matrix A is positive definite or not? 
2.2. For the matrix A 
Qe Ad 
A= 12 1 
Te de (2 


orthogonally diagonalize the corresponding quadratic form, i.e. find a diagonal ma- 
trix D and a unitary matrix U such that D = U* AU. 


3. Sylvester’s Law of Inertia 


As we discussed above, there are many ways to diagonalize a quadratic form. 
Note, that a resulting diagonal matrix is not unique. For example, if we got 
a diagonal matrix 
D= diag{\1, 2, tea Pia An}, 
we can take a diagonal matrix 
S = diag{s1,52,...,5n}, 5h ER, sp 40 
and transform D to 
S*DS = diag{st)1, 539, ..., 82 An}. 
This transformation changes the diagonal entries of D. However, it does not 
change the signs of the diagonal entries. And this is always the case! 


Namely, the famous Sylvester’s Law of Inertia states that: 


For a Hermitian matrix A (i.e. for a quadratic form Q[x] = 
(Ax,x)) and any of its diagonalization D = S*AS, the number 
of positive (negative, zero) diagonal entries of D depends only on 
A, but not on a particular choice of diagonalization. 


Here we of course assume that S' is an invertible matrix, and D is a diagonal 
one. 

The idea of the proof of the Sylvester’s Law of Inertia is to express 
the number of positive (negative, zero) diagonal entries of a diagonalization 
D=S*AS in terms of A, not involving S' or D. 


We will need the following definition. 
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Definition. Given an n x n Hermitian matrix A = A* (a quadratic form 
Q |x] = (Ax, x) on F”) we call a subspace EF C F” positive (resp. negative, 
resp. neutral) if 
(Ax, x) > 0 (resp. (Ax,x) <0, resp. (Ax, x) = 0) 

for allx €¢ E,x £0. 

Sometimes, to emphasize the role of A we will say A-positive (A negative, 
A-neutral). 
Theorem 3.1. Let A be ann x n Hermitian matrix, and let D = S* AS be 
its diagonalization by an invertible matriz S. Then the number of positive 


(resp. negative) diagonal entries of D coincides with the maximal dimension 
of an A-positive (resp. A-negative) subspace. 


The above theorem says that if ry is the number of positive diagonal 
entries of D, then there exists an A-positive subspace E of dimension r+, 
but it is impossible to find a positive subspace EF with dim EF > rx. 

We will need the following lemma, which can be considered a particular 
case of the above theorem. 


Lemma 3.2. Let D be a diagonal matrix D = diag{A1, X2,-..,;An}. Then 
the number of positive (resp. negative) diagonal entries of D coincides with 
the maximal dimension of a D-positive (resp. D-negative) subspace. 


Proof. By rearranging the standard basis in F” (changing the numeration) 
we can always assume without loss of generality that the positive diagonal 
entries of D are the first r_ diagonal entries. 

Consider the subspace £4 spanned by the first r+ coordinate vectors 
€1,€2,...,e,,. Clearly E, is a D-positive subspace, and dim EF, = ry. 

Let us now show that for any other D-positive subspace E we have 
dim E < r4. Consider the orthogonal projection P = Pros 


Px= (tt Mi sus GeO 90), xX = (@1,22,...,2n)°. 


For a D-positive subspace F define an operator T: E > Ey by 
Tx = Px, Vx € E. 


In other words, T is the restriction of the projection P: P is defined on 
the whole space, but we restricted its domain to F and target space to Ey. 
We got an operator acting from E to £4, and we use a different letter to 
distinguish it from P. 

Note, that KerT = {0}. Indeed, let for x = (21,%2,...,an)" € E we 
have Tx = Px = 0. Then, by the definition of P 


0, 


Ly v2 Ma's Try 
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and therefore 


n 
(Dx,x)= So Aee<O (Ay <0 fork > ry). 
k=r4+4+1 
But x belongs to a D-positive subspace EF, so the inequality (Dx,x) < 0 
holds only for x = 0. 

Let us now apply the Rank Theorem (Theorem 7.1 from Chapter 2). 
First of all, rank T = dimRanT < dim Ey = r, because RanT Cc E,. By 
the Rank Theorem, dim Ker T + rank T = dim FE. But we just proved that 
Ker T = {0}, ie. that dim Ker T = 0, so 


dim £ = rankT < dim Ey =r +. 


To prove the statement about negative entries, we just apply the above 
reasoning to the matrix —D. 


Proof of Theorem 3.1. Let D = S*AS be a diagonalization of A. Since 
(Dx, x) = (S*ASx,x) = (ASx, Sx) 


it follows that for any D-positive subspace FE, the subspace S'F is an A- 
positive subspace. The same identity implies that for any A-positive sub- 
space F the subspace S~!F is D-positive. 


Since S and S~! are invertible transformations, dim E = dim SE and 
dim F = dim S~!F. Therefore, for any D positive subspace E we can find 
an A-positive subspace (namely SF) of the same dimension, and vice versa: 
for any A-positive subspace F’ we can find a D-positive subspace (namely 
S—'F) of the same dimension. Therefore the maximal possible dimensions o 
a A-positive and a D-positive subspace coincide, and the theorem is proved. 


The case of negative diagonal entries treated similarly, we leave the 
details as an exercise to the reader. 


4. Positive definite forms. Minimax characterization of 
eigenvalues and the Sylvester’s criterion of positivity 


Definition. A quadratic form Q is called 
e Positive definite if Q[x] > 0 for all x 4 0. 
e Positive semidefinite if Q[x] > 0 for all x. 
e Negative definite if Q[x] < 0 for allx £0. 
e Negative semidefinite if Q[x] < 0 for all x. 


e Indefinite if it take both positive and negative values, i.e. if there 
exist vectors x; and x2 such that Q[xi] > 0 and Q[x.] < 0. 
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Definition. A Hermitian matrix A = A* is called positive definite (negative 
definite, etc...) if the corresponding quadratic form Q[x] = (Ax,x) is 
positive definite (negative definite, etc...). 


Theorem 4.1. Let A = A*. Then 


. A is positive definite iff all eigenvalues of A are positive. 


. A is positive semidefinite iff all eigenvalues of A are non-negative. 


1 

2 

3. A is negative definite iff all eigenvalues of A are negative. 

4. A is negative semidefinite iff all eigenvalues of A are non-positive. 
5 


. A is indefinite iff it has both positive and negative eigenvalues. 


Proof. The proof follows trivially from the orthogonal diagonalization. In- 
deed, there is an orthonormal basis in which matrix of A is diagonal, and 
for diagonal matrices the theorem is trivial. 


Remark. Note, that to find whether a matrix (a quadratic form) is positive 
definite (negative definite, etc) one does not have to compute eigenvalues. 
By Sylvester’s Law of Inertia it is sufficient to perform an arbitrary, not 
necessarily orthogonal diagonalization D = S* AS and look at the diagonal 
entries of D. 


4.1. Sylvester’s criterion of positivity. It is an easy exercise to see that 
a 2x 2 matrix 


is positive definite if and only if 
(4.1) a>0O and det A = ac — |b|? > 0 


Indeed, if a > 0 and det A = ac—|b|? > 0, then c > 0, so trace A = a+c > 0. 
So we know that if 1, A2 are eigenvalues of A then A1A2 > 0 (det A > 0) 
and \; + A2 = trace A > 0. But that only possible if both eigenvalues are 
positive. So we have proved that conditions (4.1) imply that A is positive 
definite. The opposite implication is quite simple, we leave it as an exercise 
for the reader. 


This result can be generalized to the case of n x n matrices. Namely, for 
a matrix A 


Qi @1,2 .-. Gin 
a21 a2,2 ani 4 a2n 


Qn1 An,2 +++ Ann 
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let us consider its all upper left submatrices 


Be ses 1 41.2 413 
A; = (a1), A2 = ( ras ie ) ,Ag= | aa. ao2 a23 ],...,An=A 
, ; a3,1 43,2 43,3 


Theorem 4.2 (Sylvester’s Criterion of Positivity). A matrix A = A* is 
positive definite if and only if 


det Ay > 0 for all k =1,2,...,n. 


First of all let us notice that if A > 0 then A; > 0 also (can you explain 
why?). Therefore, since all eigenvalues of a positive definite matrix are 
positive, see Theorem 4.1, det Ay > 0 for all k. 


One can show that if det A; > 0 Vk then all eigenvalues of A are posi- 
tive by analyzing diagonalization of a quadratic form using row and column 
operations, which was described in Section 2.2. The key here is the obser- 
vation that if we perform row/column operations in natural order (i.e. first 
subtracting the first row/column from all other rows/columns, then sub- 
tracting the second row/columns from the rows/columns 3, 4,...,n, and so 
on...), and if we are not doing any row interchanges, then we automatically 
diagonalize quadratic forms A; as well. Namely, after we subtract first and 
second rows and columns, we get diagonalization of Ag; after we subtract 
the third row/column we get the diagonalization of Ag, and so on. 


Since we are performing only row replacement we do not change the 
determinant. Moreover, since we are not performing row exchanges and 
performing the operations in the correct order, we preserve determinants of 
A; Therefore, the condition det A; > 0 guarantees that each new entry in 
the diagonal is positive. 

Of course, one has to be sure that we can use only row replacements, and 
perform the operations in the correct order, i.e. that we do not encounter 
any pathological situation. If one analyzes the algorithm, one can see that 
the only bad situation that can happen is the situation where at some step 
we have zero in the pivot place. In other words, if after we subtracted the 
first k rows and columns and obtained a diagonalization of A,, the entry in 
the & + 1st row and k + 1st column is 0. We leave it as an exercise for the 
reader to show that this is impossible. 


The proof we outlined above is quite simple. However, let us present, in 
more detail, another one, which can be found in more advanced textbooks. 
I personally prefer this second proof, for it demonstrates some important 
connections. 


We will need the following characterization of eigenvalues of a hermitian 
matrix. 
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4.2. Minimax characterization of eigenvalues. Let us recall that the 
codimension of a subspace F C X is by the definition the dimension of its 
orthogonal complement, codim E = dim(E+). Since for a subspace E CX, 
dim X = n we have dimE + dimE+t = n, we can see that codimE = 
dim X — dim E. 

Recall that the trivial subspace {0} has dimension zero, so the whole 
space X has codimension 0. 


Theorem 4.3 (Minimax characterization of eigenvalues). Let A = A* be 
ann xn matrix, and let 1 > A2 >... > An be its eigenvalues taken in the 
decreasing order. Then 


Ap = max min (Ax,x)= min max (Ax, x). 
E: xeE PF: xeF 
dim E=k \|x||=1 codim F=k—1 \|x|[=1 


Let us explain in more details what the expressions like max min and 
min max mean. To compute the first one, we need to consider all subspaces 
FE of dimension k. For each such subspace F£ we consider the set of all x € EF 
of norm 1, and find the minimum of (Ax, x) over all such x. Thus for each 
subspace we obtain a number, and we need to pick a subspace FE such that 
the number is maximal. That is the max min. 


The min max is defined similarly. 


Remark. A sophisticated reader may notice a problem here: why do the 
maxima and minima exist? It is well known, that maximum and minimum 
have a nasty habit of not existing: for example, the function f(x) = x has 
neither maximum nor minimum on the open interval (0, 1). 


However, in this case maximum and minimum do exist. There are two 
possible explanations of the fact that (Ax,x) attains maximum and mini- 
mum. The first one requires some familiarity with basic notions of analysis: 
one should just say that the unit sphere in F, i.e. the set {x € E: ||x|| = 1} 
is compact, and that a continuous function (Q[x] = (Ax, x) in our case) on 
a compact set attains its maximum and minimum. 


Another explanation will be to notice that the function Q[x] = (Ax, x), 
x € EF is a quadratic form on E. It is not difficult to compute the matrix 
of this form in some orthonormal basis in EF’, but let us only note that this 
matrix is not A: it has to be ak x k matrix, where k = dim E. 


It is easy to see that for a quadratic form the maximum and minimum 
over a unit sphere is the maximal and minimal eigenvalues of its matrix. 


As for optimizing over all subspaces, we will prove below that the max- 
imum and minimum do exist. 
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Proof of Theorem 4.3. First of all, by picking an appropriate orthonor- 
mal basis, we can assume without loss of generality that the matrix A is 
diagonal, A = diag{)j, A2,.--, An}. 

Pick subspaces Ff and F, dim = k, codimF = k —-1, ie. dimE = 
n—k+1. Since dim E+dim F > n, there exists a non-zero vector x9 € ENF. 
By normalizing it we can assume without loss of generality that ||xo|| = 1. 
We can always arrange the eigenvalues in decreasing order, so let us assume 
that A1 > Ag >... > An. 


Since x belongs to the both subspaces E and F’ 


min (Ax, x) < (Axo, x0) < max (Ax, x). 
\|x||=1 \||=1 
We did not assume anything except dimensions about the subspaces F and 
F’, so the above inequality 
(4.2) min (Ax,x) < max (Ax,x). 
xe xe 
\|x||=1 \|||=1 
holds for all pairs of EF and F of appropriate dimensions. 
Define 


Eo := span{e1, €2,---, ex}, fo := span{eg, Ck+1,€k+25-+-5 en}. 


Since for a self-adjoint matrix B, the maximum and minimum of (Bx, x) over 
the unit sphere {x : ||x|| = 1} are the maximal and the minimal eigenvalue 
respectively (easy to check on diagonal matrices), we get that 
min (Ax, x) = max (Ax,x) = Ax. 
xe Eo xEFo 
\[<||=1 Il<||=1 
It follows from (4.2) that for any subspace FE, dim EF = k 
min (Ax,x) < max (Ax,x) = Az 
x xE€ lo 
I[<||=1 \|<||=1 
and similarly, for any subspace F' of codimension k — 1, 
max (Ax, x) > iin (Ax, x) = Ag. 
\[<||=1 Il<||=1 


But on subspaces Eg and Fo both maximum and minimum are Ax, so 
min max = max min = Ax. 


Corollary 4.4 (Intertwining of eigenvalues). Let A= A* = {ae} Ppa 
be a self-adjoint matrix, and let A = {aj, ‘rat be its submatrix of size 
(n—1) x (n—1). Let Aq, A2,.-.,An and [1, 12,.-+, ln—1 be the eigenvalues 


of A and A respectively, taken in decreasing order. Then 


Ay > pi > AQ > plo >. & An-1 = Mn-1 => An- 
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1.€. 
Ak = Mk = Ak+1, k=1,2,...,.n—1 


Proof. Let X C F” be the subspace spanned by the first n—1 basis vectors, 
X= span{e, €2,...,@n—1}. Since (Ax, x) = (Ax,x) forallx € x Theorem 
4.3 implies that 
Me = max min (Ax,x). 
EcX  |x€ 
dim E=k ||x||=1 
To get Ay we need to get maximum over the set of all subspaces E of F”, 
dim E = k, i.e. take maximum over a bigger set (any subspace of X isa 
subspace of F”). Therefore 
Hk < Ak. 

(the maximum can only increase, if we increase the set). 


On the other hand, any subspace EF C X of codimension k — 1 (here 
we mean codimension in X) has dimension n — 1 — (k — 1) = n—k, so its 
codimension in F” is k. Therefore 


fe = min max(Ax,x)< min  max(Ax,x) =A, 
Bcx  xeb _ECE® — xcE 
dim E=n—k ||x||=1 dim B=n—-k ||x||=1 


(minimum over a bigger set can only be smaller). 


Proof of Theorem 4.2. If A > 0, then Ay, > 0 for k = 1,2,...,n as well 
(can you explain why’). Since all eigenvalues of a positive definite matrix 
are positive (see Theorem 4.1), det Aj, > 0 for all k = 1,2,...,n. 

Let us now prove the other implication. Let det A, > 0 for all k. We 
will show, using induction in k, that all A; (and so A = A,) are positive 
definite. 

Clearly A; is positive definite (it is 1 x 1 matrix, so A; = det Aj). 
Assuming that A,_1 > 0 (and det Ay > 0) let us show that A; is positive 
definite. Let A1,A2,...,Ax and j1, f2,-.-,e—-1 be eigenvalues of A, and 
Ax_, respectively. By Corollary 4.4 


Aj = pj > 0 for j =1,2,...,k—1. 


Since det A, = AyAg..-Ap—1An > 0, the last eigenvalue A, must also be 
positive. Therefore, since all its eigenvalues are positive, the matrix A, is 
positive definite. 


4.3. Some remarks. First of all notice, that Sylvester Criterion of Posi- 
tivity does not generalize to positive semidefinite matrices ifn > 3, meaning 
that for n x n matrices, n > 3, the conditions det Ay, > 0 do not imply that 
A is positive semidefinite, see Problem 4.6 below. 
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For 2 x 2 matrices, however, the conditions det A, > 0 imply that A is 
positive semidefinite, see Problem 4.3 below. This sometimes leads to the 
wrong conclusion about n x n matrices. 

Finally, we should should say couple words about negative definite ma- 
trices. It is a typical students’ mistake to say that the condition det Ay, < 0 
implies that A is negative definite. But that is wrong! 

To check if the matrix A is negative definite, one just have to check 
that the matrix —A is positive definite. Applying Sylvester’s Criterion 
of Positivity to —A one can see that A is negative definite if and only if 
(—1)* det Ay > 0 for all k = 1,2,...,n. 


Exercises. 


4.1. Using Sylvester’s Criterion of Positivity check if the matrices 


4 2 1 2 @ 
A=[({2 3-1], B=[-1 4 -2 
1. =i) 2 9% 9 


are positive definite or not. 
Are the matrices —A, A? and A~!, A+ B-!, A+ B, A— B positive definite? 
4.2. True or false: 
a) If A is positive definite, then A® is positive definite. 
b) If A is negative definite, then A® is negative definite. 
c) If A is negative definite, then Al? is positive definite. 
d) 


If A is positive definite and B is negative semidefinite, then A—B is positive 
definite. 


e) If A is indefinite, and B is positive definite, then A + B is indefinite. 


4.3. Let A be a 2x 2 Hermitian matrix, such that a; > 0, det A > 0. Prove that 
A is positive semidefinite. 


4.4. Find a real symmetric n x n matrix such that det Ay, > 0 for all k = 1,2,...,n, 
but the matrix A is not positive semidefinite. Try to find an example for the minimal 
possible n. 


4.5. Let A be an n x n Hermitian matrix such that det A, > 0 for all k = 
1,2,...,2—1 and det A > 0. Prove that A is positive semidefinite. 


4.6. Find a real symmetric 3 x 3 matrix A such that a1, > 0, det A, > 0 for 
= 2,3, but the matrix A is not positive semidefinite. 


5. Positive definite forms and inner products 


Let V be an inner product space and let B = v1, v2,...,Vn be a basis (not 
necessarily orthogonal) in V. Let G = {9;,«}7,1 be the matrix defined by 


Gj,k = (Vk, vj). 


5. Positive definite forms and inner products 215 


Ifx =o, eev_ and y = D0, yeve, then 
n 

(x,y) = | So reve, So) yivy S> KU; (Ves Vs) 
k j = 


7 kjj=1 
= So 95.409; = (Gx]B; [y]B)cm 5 


where (-,-)cn stands for the standard inner product in C”. One can im- 
mediately see that G is a positive definite matrix (why?). 


So, when one works with coordinates in an arbitrary (not necessarily 
orthogonal) basis in an inner product space, the inner product (in terms of 
coordinates) is not computed as the standard inner product in C”, but with 
the help of a positive definite matrix G as described above. 


Note, that this G-inner product coincides with the standard inner prod- 
uct in C” if and only if G = J, which happens if and only if the basis 
V1, V2,---,;Vn is orthonormal. 

Conversely, given a positive definite matrix G one can define a non- 
standard inner product (G-inner product) in C” by 


(x, ya = (Gx, yer, xyYE Cc. 
One can easily check that (x, y)q is indeed an inner product, i.e. that prop- 
erties 1-4 from Section 1.3 of Chapter 5 are satisfied. 


Chapter 8 


Dual spaces and 
tensors 


All vector spaces in this chapter are finite-dimensional. 


1. Dual spaces 


1.1. Linear functionals and the dual space. Change of coordinates 
in the dual space. 


Definition 1.1. A linear functional on a vector space V (over a field F) is 
a linear transformation DL: V — F. 


This special class of linear transformation sufficiently important to de- 
serve a separate name. 


If one thinks of vectors as of some physical objects, like force or velocity, 
then one can think of a linear functional as a (linear) measurement, that 
gives you some a scalar quantity as the result: think about force or velocity 
in a given direction. 


Definition 1.2. A collection of all linear functionals on a finite-dimensional! 
vector space V is called the dual of V and is usually denoted as V’ or V* 


As it was discussed earlier in Section 4 of Chapter 1, the collection 
L(V,W) of all linear transformations acting from V to W is a vector space 


lWe consider here only finite-dimensional spaces because for infinite-dimensional spaces the 
dual space consists not of all but only of the so-called bounded linear functionals. Without giving 
the precise definition, let us only mention than in the finite-dimensional case (both the domain 
and the target space are finite-dimensional) all linear transformations are bounded, and we do not 
need to mention the word bounded 
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(with naturally defined operations of addition and multiplication by a scalar. 
So, the dual space V’ = £(V,F) is a vector space. 

Let us consider an example. Let the space V be R”, what is its dual? As 
we know, a linear transformation T : R” — R"™ is represented by an m x n 
matrix, so a linear functional on R” (i.e. a linear transformation L : R” > R 
is given by an 1 x n matrix (row), let us denote it by [LZ]. The collection 
of all such rows is isomorphic to R” (isomorphism is given by taking the 
transpose [L] > [L]*). 

So, the dual of R” is R” itself. The same holds true for C”, of course, as 
well as for F”, where F is an arbitrary field. Since the space V over a field 
F (here we mostly interested in the case F = R or F = C) of dimension n is 
isomorphic to F”, and the dual to F” is isomorphic to F”, we can conclude 
that the dual V’ is isomorphic to V 

Thus, the definition of the dual space is starting to look a bit silly, since 
it does not appear to give us anything new. 


However, that is not the case! If we look carefully, we can see that the 
dual space is indeed a new object. To see that, let us analyze how the entries 
of the matrix [LZ] (which we can call the coordinates of L) change when we 
change the basis in V. 


1.1.1. Change of coordinates formula. Let 
A = {aj,a2,...,an}, B = {b1, b2,..., bn} 


be two bases in V, and let [Z].4 = [L]s,, and [L]g = [L]s,g be the matrices of 
L in the bases A and B respectively (we suppose that the basis in the target 
space of scalars is always the standard one, so we can skip the subscript S 
in the notation). Then recalling the change of coordinate rule from Section 
8.4 in Chapter 2 we get that 


[LE] = [L]al] as. 
Recall that for a vector v € V its coordinates in different bases are related 
by the formula 

[v]a = U]s.4[va, 
and that 

[a6 = U1 g24- 
If we denote $ := [I]z,,, so [v]g = S[v]., the entries of the vectors [L]f 

and [L]% are related by the formula 
(1.1) [Llp = (S*)*(L]4 
(since we usually represent a vector as a column of its coordinates, we use 
[L]4, and [L]f instead of [Z]_4 and [L],) 


Saying it in words 
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If S is the change of coordinate matrix (from old coordinates to 
the new ones) in X, then the change of coordinate matrix in the 
dual space X’ is (S~1)?. 


So, the dual space V’ of V while isomorphic to V is indeed a different 
object: the difference is in how the coordinates in V and V’ change when 
one changes the basis in V. 


Remark. One can ask: why can’t we pick a basis in X and some completely 
unrelated basis in the dual X'? Of course, we can do that, but imagine, what 
would it take to compute L(x), knowing coordinates of x in some basis and 
coordinates of ZL in some completely unrelated basis. 

So, if we want (knowing the coordinates of a vector x in some basis) 
to compute the action of a linear functional LZ using the standard rules of 
matrix algebra, i.e. to multiply a row (the functional) by a column (the 
vector), we have no choice: the “coordinates” of the linear functional L 
should be the entries of its matrix (in the same basis). 

As we can see later, see Section 1.3 below, the entries (“coordinates”) 
of a linear functional are indeed the coordinates in some basis (the so-called 
dual basis. 


1.1.2. A uniqueness theorem. 


Lemma 1.3. Let v € V. If L(v) = 0 for all L € V’ thenv =0. Asa 
corollary, if L(v1) = L(v2) for all L EV’, then v1 = vo 

Proof. Fix a basis B in V. Then 

Lv) = [Lelv]s- 


Picking different matrices (i.e. different L) we can easily see that [v]z = 0. 
Indeed, if 
Ly = (0,...,0,1,0,...,0] 


then the equality 
Li[v]e = 0 
implies that kth coordinate of [v]z is 0. 


Using this equality for all & we conclude that [v]z = 0, so v = 0. 


1.2. Second dual. As we discussed above, the dual space V’ is a vector 
space, so one can consider its dual V” = (V’)’. It looks like one that can 
consider the dual V’ of V” and so on...However, the fun stops with V” 
because 


The second dual V” is canonically (i.e. in a natural way) isomor- 
phic to V 
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Let us decipher this statement. Any vector v € V canonically defines a 
linear functional Ly on V’ (i.e. an element of the second dual V” by the rule 


ipagwy NPey’ 


It is easy to check that the mapping T : V — V”, Tv = Ly, is a linear 
transformation. 


Note, that Ker T = {0}. Indeed, if Tv = 0, then 
f(v) =0 VfeVv’, 


and by Lemma 1.3 above we have v = 0. 

Since dim V” = dim V’ = dim V, the condition Ker T = {0} implies that 
T is an invertible transformation (isomorphism). 

The isomorphism T is very natural, (at least for a mathematician). In 
particular, it was defined without using a basis, so it does not depend on 
the choice of basis. So, informally we say that V” is canonically isomorphic 
to V: the rigorous statement is that the map T described above (which we 
consider to be a natural and canonical) is an isomorphism from V to V”. 


1.3. Dual, a.k.a. biorthogonal bases. In the previous sections, we sev- 
eral times referred to the entries of the matrix of a linear functional L as 
coordinates. But coordinates in this book usually means the coordinates in 
some basis. Are the “coordinates’ of a linear functional really coordinates 
in some basis? Turns out the answer is “yes”, so the terminology remains 
consistent. 

Let us find the basis corresponding to the coordinates of L € V’. Let 
{b1, bo,..., by} be a basis in V. For L € V’, let [L]g = [Ln, Lo,..., Ln] be 
its matrix (row) in the basis B. Consider linear functionals b, b4,..., bi}, € 
V’ defined by 


(1.2) bj,(bj) = dk, 


where 06,,; is the Kroneker delta, 


i os ale j=k 

‘7 0 j#k 
Recall, that a linear transformation is defined by its action on a basis, so 
the functionals bj, are well defined. 


As one can easily see, the functional L can be represented as 


L=)_ Lyby. 
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Indeed, take an arbitrary v = >, axbsg € V, so [v]g = [a1,09,..., an]. 
By linearity and definition of bj), 


by(v) = by, | So ajb; | = 0 ajbj,(b;) = ax. 
ji j 


Therefore 
Lv = [Lalv]n = >> Lpox = > Lgbi,(v). 
k k 


Since this identity holds for all v € V, we conclude that L = $>), Lybj,. 


Since we did not assume anything about L € V’, we have just shown 
that any linear functional L can be represented as a linear combination of 
b/, b5,...,bi,, so the system bi, b4,..., bj, is generating. 


Let us show that this system is linearly independent (and so it is a basis). 
Let 0 = 0, Lyb),. Then for an arbitrary 7 = 1,2,...,n 


0 = 0b; = (=: iv] (bj) =} 0 Lib (by) = L; 
k 


k 
so L; = 0. Therefore, all L; are 0 and the system is linearly independent. 


So, the system b},b),..., bi, is indeed a basis in the dual space V’ and 
the entries of [L]g are coordinates of L with respect to the basis B. 
Definition 1.4. Let bi, be,...,b, be a basis in V. The system of vectors 

1 Q-+-,b, EV’, 
uniquely defined by the equation (1.2) is called the dual (or biorthogonal) 
basis to bi, bg,..., Dn. 


Note that we have shown that the dual system to a basis is a basis. Note 
also that in b},b4,..., bi, is the dual system to a basis bi, ba,..., bn, then 
bj, bg,...,b, is the dual to the basis b/, b4,..., bi, 

1.3.1. Abstract non-orthogonal Fourier decomposition. The dual system can 
be used for computing the coordinates in the basis bi,be,...,bn. Let 


bj, b,...,b/, be the biorthogonal system to bi,be,...,bn, and let v = 
Yo, aeby. Then, as it was shown before 


bi(v) =b; (= obs) = S- axb; (by) = abi (bj) = a5, 
k k 
so a, = bj,(v). Then we can write 


(1.3) v= S> bj.(v)be. 
k 


In other words, 
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The kth coordinate of a vector v in a basis B = {by, bg,..., bn} 
is b,(v)}, where B’ = {b{, b},..., bi, } is the dual basis. 


This formula is called (a baby version of) the abstract non-orthogonal 
Fourier decomposition of v (in the basis b;, bz,...,b,,). The reason for this 
name will be clear later in Section 2.3. 


Remark 1.5. Let A= {aj,a2,...,an} and B = {b1, b2,..., bm} be bases 
in X and Y respectively, and let B’ = {b/, b},...,b/,,} be the dual basis to 
B. Then the matrix [T]g,.4 =: A = {ax,j}7L1 7.1 of the transformation T in 
the bases A, B is given by 


apy = b;,(T'a;), 9S 12 eth BH, Dee. 7M 


1.4. Examples of dual systems. The first example we consider is a trivial 
one. Let V be R” (or C”) and let e1,e2,...,@n be the standard basis there. 
The dual space will be the space of n-dimensional row vectors, which is 
isomorphic to R” (or C” in the complex case), and the standard basis there 


is the dual to e;,e,...,@n. The standard basis in (R”)’ (or in (C")’ is 
e}’.e3,...,e) obtained from ej, e2,...,e, by transposition. 


1.4.1. Taylor formula. The next example is more interesting. Let us con- 
sider the space P,, of polynomials of degree at most n. As we know, the 
powers {e;}7_), e(t) = t” form the standard basis in this space. What is 
the dual to this basis? 

The answer might be tricky to guess, but it is very easy to check when 
you know it. Namely, consider the linear functionals e, € (Pn)’, k = 
0,1,...,7, acting on polynomials as follows: 


1 dé 1 « 
ex(p) = FpePl®) |o= GPO: 


here we use the usual agreement that 0! = 1 and d° f/dt® = f. 
Since 


Peaf GG, bes 


dt® 0 k>j 


we can easily see that the system {e),}7_, is the dual to the system of powers 
{er} po: 

Applying (1.3) to the above system {e;,}7%_, and its dual we get that 
any polynomial p of degree at most n can be represented as 


nok) 
(1.4) rn) = Oe 
k=0 


This formula is well-known in Calculus as the Taylor formula for polyno- 
mials. More precisely, this is a particular case of the Taylor formula, the 
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so-called Maclaurin formula. The general Taylor formula 


"pa 
rt) = Pe ayk 


k=0 
can be obtained from (1.4) by applying it to the polynomial p(7 — a) and 
then denoting t := 7 — a. It also can be obtained by considering powers 


(t—a)*,k =0,1,...,n and finding the dual system the same way we did it 
for t*.? 

1.4.2. Lagrange interpolation. Our next example deals with the so-called 
Lagrange interpolating formula. Let a), a@2,...,@n41 be distinct points (in 
R or C), and let P,, be the space of polynomials of degree at most n. Define 
functionals f, € P/, by 


f.(p) = p(ag) Vp € Pp. 


What is the dual of this system of functionals? Note, that while it is not 
hard to show that the functionals f, are linearly independent, and so (since 
dim(P,,)’ = dimP,, = n+ 1) form a basis in (P,)’, we do not need that. We 
will construct the dual system directly, and then will be able to see that the 
system fj, fo,...,fn41 is indeed a basis. 


Namely, let us define the polynomials pz, k = 1,2,...,n+1 as 


pe(t) = T] (¢- ay) / TY (ax — a3) 
TIA IIA 
where j in the products runs from 1 to n+ 1. Clearly pz(ax) = 1 and 
pr(aj) = 0 if 7 A k, so indeed the system py, p2,...,Pn+41 is dual to the 
system f}, fo,..., fr4i. 
There is a little detail here, since the notion of a dual system was defined 
only for a basis, and we did not prove that either of the systems is one. 


But one can immediately see that the system pj, p2,..-,Pn4i is linearly 
independent (can you explain why?), and since it contains n + 1 = dimP,, 
vectors, it is a basis. Therefore, the system of functionals f;, fo,...,fp41 is 


also a basis in the dual space (P,,)’. 


Remark. Note, that we did not just got lucky here, this is a general phe- 
nomenon. Namely, as Problem 1.1 below asserts, any system of vectors 
having a “‘dual” one must be linearly independent. So, constructing a dual 
system is a way of proving linear independence (and an easy one, if you can 
do it easily as in the above example). 


2 Note, that the general Taylor formula says more than the formula for polynomials obtained 
here: it says that any n times differentiable function can be approximated near the point a by its 
Taylor polynomial. Moreover, if the function is n+ 1 times differentiable, it allows us to estimate 
the error. The above formula for polynomials serves as a motivation and a starting point for the 
general case 
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Applying formula (1.3) to the above example one can see that the 
(unique) polynomial p, deg p < n satisfying 
(1.5) P(ax) = Yk; k=1,2,...,.n+1 
can be reconstructed by the formula 
n+1 


(1.6) p(t) = >> yepe(t)- 
k=1 


This formula is well -known in mathematics as the “Lagrange interpolation 


formula”. 
Exercises. 
1.1. Let vi, v2,...,Vv, be a system of vectors in X such that there exists a system 
vi,v5,..-, Vv), of linear functionals such that 
1 j=k 
/ sues ’ 
Vi (v;) { 0 j x k 
a) Show that the system v1, V2,...,V, is linearly independent. 
b) Show that if the system v1, v2,...,v, is not generating, then the “biorthog- 
onal” system vj,Vv4,...,V/. is not unique. Hint: Probably the easiest way 
to prove that is to complete the system v1, v2,..., Vv, to a basis, see Propo- 


sition 5.4 from Chapter 2 


1.2. Prove that given distinct points a1,d@2,...,@,41 and values yj, Y2,---;Yn+1 
(not necessarily distinct) the polynomial p, degp < n satisfying (1.5) is unique. 
Try to prove it using the ideas from linear algebra, and not what you know about 
polynomials. 


2. Dual of an inner product space 


Let us recall that there is no inner product space over an arbitrary field, 
that all our inner product spaces are either real or complex. 
2.1. Riesz representation theorem. 


Theorem 2.1 (Riesz representation theorem). Let H be an inner product 
space. Given a linear functional L on H there exists a unique vector y © H 
such that 


(2.1) L(v) = (v,y) Vv € H. 

Proof. Fix an orthonormal basis e1, e2,...,@n in H, and let 
[L] = [L1, L2, none Ln] 

be the matrix of L in this basis. Define vector y by 


(2.2) y= S> Tree, 
k 


2. Dual of an inner product space 225 


where LZ; denotes the complex conjugate of Ly. In the case of a real space 
the conjugation does nothing and can be simply ignored. 


We claim that y satisfies (2.1). 
Indeed, take an arbitrary vector v = )>, a,ex. Then 
[v] = [a1, Qs. +65 On]? 
and 


L(v) = [E][v] = So Laon. 
k 


On the other hand 
(v.y) = x apLy = + apbp 


o (2.1) holds. 


To show that the vector y is unique, let us assume that y satisfies (2.1). 
Then for k = 1,2,...,n 


(ex, y) = L(ex) = Lr, 


so (y,e,) = Ly. Then, using the formula for the decomposition in the 
orthonormal basis, see Section 2.1 of Chapter 5 we get 


y= (exer = }7 Tree 
k k 


which means that any vector satisfying (2.1) must be represented by (2.2). 


Remark. While the statement of the theorem does not require a basis, 
the proof presented above utilizes an orthonormal basis in H, although the 
resulting vector y does not depend on the choice of the basis’. An advantage 
of this proof is that it gives a formula for computing the representing vector 


y. 


2.2. Is an inner product space a dual to itself? For a vector y in an 
inner product space H one can define a linear functional Ly, 


Ly(v) = (v,y). 
It is easy to see that the mapping y +> Ly is an injective mapping from 
HT to sits dual H*. The above Theorem 2.1 asserts that this mapping is a 
surjection (onto), so one is tempted to say that the dual of an inner product 
space H is (canonically isomorphic to) the space H itself, with the canonical 
isomorphism given by y +> Ly. 


3 An alternative proof that does need a basis is also possible. This alternative proof, that 
works in infinite-dimensional case, uses strong convexity of the unit ball in the inner product space 
together with the idea of completeness from analysis. 


Recall that if we 
know coordinates of 
2 vectors in an or- 
thonormal basis, we 
can compute the in- 
ner product by tak- 
ing these coordinate 
and computing the 
standard inner prod- 
uct in C” (or R”). 
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This is indeed the case if H is a real inner product space and it is 
easy to show that the map y +> Ly is a linear transformation. We already 
discussed that the map is injective and surjective, so it is an invertible linear 
transformations, i.e. an isomorphism. 


However if H is a complex space, one needs to be a bit more careful. 
Namely, the mapping y +> Ly that that maps a vector y € Hf to the linear 
functional Ly as in Theorem 2.1 (Ly(v) = (v,y)) is not a linear one. 


More precisely, while it is easy to show that 


(2.3) Ly, +y2 = Ly, + Lys, 
it follows from the definition of Ly and properties of inner product that 
(2.4) Lay(v) = (v, ay) = a(v,y) = aLy(v), 


so Lay = aLy. 

In other words, one can say that the dual of a complex inner product 
space is the space itself but with the different linear structure: adding 2 vec- 
tors is equivalent to adding corresponding linear functionals, but multiplying 
a vector by a is equivalent to multiplying the corresponding functional by 
a. 


A transformation T satisfying T(ax + By) = @Tx+ BTy is some- 
times called a conjugate linear transformation. 


So, for a complex inner product space H its dual can be canonically iden- 
tified with H by a conjugate linear isomorphism (i.e. invertible conjugate 
linear transformation) 

Of course, for a real inner product space the complex conjugation can 
be simply ignored (because a is real), so the map y +> Ly is a linear one. 
In this case we can, indeed say that the dual of an inner product space H 
is the space itself. 


In both, real and complex cases, we nevertheless can say that the dual 
of an inner product space can be canonically identified with the space itself. 


2.3. Biorthogonal systems and orthonormal bases. 


Definition 2.2. Let bi,b2,...,b, be a basis in an inner product space H. 
The unique system bi, b5,...,b/, in H defined by 
(bj, by) = 55,4, 


where 4; is the Kroneker delta, is called the biorthogonal or dual to the 
basis b,, b2,..., Dn. 


This definition clearly agrees with Definition 1.4, if one identifies the 
dual H’ with H as it was discussed above. Then it follows immediately 
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from the discussion in Section 1.3 that the dual system b{,b5,...,b/, to a 
basis bi, b2,..., by is uniquely defined and forms a basis, and that the dual 
to bj, pate Dy, is by, bg,..., Dn. 

The abstract non-orthogonal Fourier decomposition formula (1.3) can 


be rewritten as 
n 


v= Sow, bj, by 
k=1 
Note, that an orthonormal basis is dual to itself. So, if e1,e2,...,e, is 
an orthonormal basis, the above formula is rewritten as 
n 
v= So, ex eK 
k=1 
which is the classical (orthogonal) abstract Fourier decomposition, see for- 
mula (2.2) in Section 2.1 of Chapter 5. 


3. Adjoint (dual) transformations and transpose. 
Fundamental subspace revisited (once more) 


By analogy with the case of an inner product spaces, see Theorem 2.1, it is 
customary to write L(v), where L is a linear functional (ie. L € V’, v € V) 
in the form resembling inner product 


L(v) = (vy, L) 


Note, that the expression (v, L) is linear in both arguments, unlike the inner 
product which in the case of a complex space is linear in the first argument 
and conjugate linear in the second. So, to distinguish it from the inner 
product, we use the angular brackets.‘ 


Note also, that while in the inner product both vectors belong to the 
same space, v and LI above belong to different spaces: in particular, we 
cannot add them. 


3.1. Dual (adjoint) transformation. 


Definition 3.1. Let A: X — Y be a linear transformation. The transfor- 
mation A’: Y’ — X’ (X’ and Y’ are dual spaces for X and Y respectively) 
such that 

(Ax, y’) = (x, A’y’) Vx Ee X,y’e Y’ 
is called the adjoint (dual) to A. 


4This notation, while widely used, is far from the standard. Sometimes (v,L) is used, 
sometimes the angular brackets are used for the inner product. So, encountering expression like 
that in the text, one has to be very careful to distinguish inner product from the action of a linear 
functional. 
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Of course, it is not a priori clear why the transformation A’ exists. Below 
we will show that indeed such transformation exists, and moreover, it is 
unique. 


3.1.1. Dual transformation for the case A: F" + F™. Let us first consider 
the case when X = F", Y = F” (F here is, as usual, either R or C, but 
everything works for the case of arbitrary fields) 


As usual, we identify a vector v in F” with the column of its coordinates, 
and a linear transformation with it matrix (in the standard basis). 


The dual of F” is, as it was discussed above, the space of rows of size n, 
so we can identify its with F”. Again, we will treat an element of (F”)’ as a 
column vector of its coordinates. 


Under these agreements we have for x € F” and x’ € (F")/ 
x!(x) = (x, x’) = (x')?x 
where the right side is the product of matrices (or a row and a column). 
Then, for arbitrary x € X = F” and y’ € Y’ = (F”)’ 
(Ax, y') = (y’)"Ax = (ATy’) "x = (x, ATy’) 
(the expressions in the middle are products of matrices). 


So we have proved that the adjoint transformation exists. let us show 
that it is unique. Assume that for some transformation B 


(Ax, y’) = (x, By’) = Vx EF", Vy’ € (F”)’. 
That means that for arbitrary 
(x,(A7— B)y')=0, 9 Vx € F", Vy’ € (F”) 


By taking for x and y’ the vectors from the standard bases in F” and (F™)! = 
F™ respectively we get that the matrices B and A? coincide. 


So, for X =F", Y=F™ 


The dual transformation A’ exists, and is unique. Moreover, its 
matrix (in the standard bases) equals A? (the transpose of the 
matrix of A) 


3.1.2. Dual transformation in the abstract setting. Now, let us consider the 
general case. In fact, we do not need to do much, since everything can be 
reduced to the case of spaces F”. 

Namely, let us fix bases A = aj,a2,...,a, in X, and B = bj, bo,..., bm 
in Y, and let A’ =a},a},...,a/, and B = bj,b,...,b/,, be their dual bases 
(in X’ and Y’ respectively). For a vector v (from a space or its dual) we as 
usual denote by [v]g the column of its coordinates in the basis B. Then 


(x, x’) = ([x’]L)? [x], Yxe X Vx’ eX’, 
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i.e. instead of working with x € X and x’ € X’ we can work with columns 
their coordinates (in the dual bases A and A’ respectively) absolutely the 
same way we do in in the case of F”. Of course, the same works for Y, so 
working with columns of coordinates and then translating everything back 
to the abstract setting we get that the dual transformation exists in unique 
in this case as well. 


Moreover, using the fact (which we just proved) that for A: F" > F™ 
the matrix of A’ is AT we get 


(3.1) [4’ Lae = ([Als,a)", 
or in plain English 


The matrix of the dual transformation in the dual basis is the 
transpose of the matrix of the transformation in the original bases. 


Remark 3.2. Note, that while we used basis to construct the dual trans- 
formation, the resulting transformation does not depend on the choice of a 
basis. 


3.1.3. A coordinate-free way to define the dual transformation. Let us now 
present another, more “high brow” way of defining the dual of a linear 
transformation. Namely, for x € X, y’ € Y let us fix for a moment y’ and 
treat the expression (Ax, y’) = y’(Ax) as a function of x. It is easy to see 
that this is a composition of two linear transformations (which ones?) and 
so it is a linear function of x, i.e. a linear functional on X, i.e. an element 
of X’. 

Let us call this linear functional B(y’) to emphasize the fact that it 
depends on y’. Since we can do this for every y’ € Y’, we can define the 
transformation B : Y’ + X’ such that 


(Ax, y’) = (x, Bly’) 


Our next step is to show that B is a linear transformation. Note, that 
since the transformation B was defined in rather indirect way, we cannot 
see immediately from the definition that it is linear. To show the linearity 
of B let us take yj,y5 € Y’. Forxe X 


(x, B(ay) + Bys)) = (Ax,ay + By) by the definition of B 

= a(Ax, y}) + B(Ax, y5) by linearity 
a(x, B(y{)) + B(x, Bly4)) by the definition of B 
= (x,aB(y}) + BB(y)) by linearity 


II 


Since this identity is true for all x, we conclude that B(ay) + By)) = 
aB(y{) + 8B(y4), ie. that B is linear. 
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The main advantage of this approach that it does not require a basis, so 
it can be (and is) used in the infinite-dimensional situation. However, the 
proof that we presented above in Sections 3.1.1, 3.1.2 gives a constructive 
way to compute the dual transformation, so we used that proof instead of 
more general coordinate-free one. 


Remark 3.3. Note, that the above coordinate-free approach can be used to 
define the Hermitian adjoint of as operator in an inner product space. The 
only addition to the reasoning presented above will be the use of the Riesz 
Representation Theorem (Theorem 2.1). We leave the details as an exercise 
to the reader, see Problem 3.2 below. 


3.2. Annihilators and relations between fundamental subspaces. 


Definition 3.4. Let X be a vector space and let EF C X. The annihilator 
of E, denoted by E+ is the set of all x’ € X’ such that (x,x’) = 0 for all 
xe. 

Using the fact that X” is canonically isomorphic to X (see Section 1.2) 
we say that for E C X’ its annihilator E+ consists of all vectors x € X such 
that (x,x’) =0 for all x’ € E. 


Remark 3.5. Formally speaking, for E C X‘ the set E+ should be defined 
as the set of all x” € X” such that (x’,x’) = 0 for all x’ € E; the symbol 
EF, is often used for the annihilator from the second part of Definition 3.4. 
However, because of the natural isomorphism of X” and X there is no real 
difference between these two cases, so we will always use Et. 


Distinguishing the cases HE C X and E Cc X’ makes a lot of sense 
in the infinite-dimensional situation, where X” is not always canonically 
isomorphic to X. 

The spaces such that X” canonically isomorphic to X have a special 
name: they are called reflexive spaces. 


Proposition 3.6. Let E be a subspace of X. Then (E+)+ = E 


This proposition looks absolutely like Proposition 3.6 from Chapter 5. 
However its proof is a bit more complicated, since the suggested proof of 
Proposition 3.6 from Chapter 5 heavily used the inner product space struc- 
ture: it used the decomposition X = E @ E+, which is not true in our 
situation because, for example, EF and E+ are in different spaces. 


Proof. Let vi,v2,...,v, be a basis in FE (recall that all spaces in this 
chapter are assumed to be finite-dimensional), so F = span{vj, va,..., Vr}. 

By Proposition 5.4 from Chapter 2 the system can be extended to a 
basis in all X, i.e. one can find vectors v;41,...,Vn (n = dim X) such that 
V1, V2,---,;Vn is a basis in X. 
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Let vj,v5,..-,v}, be the dual basis to v1, va,...,Vn. By Problem 3.3 


E+ = span{vi.,1,...,v),}. Applying again this problem to E+ we get that 


(E+)+ = span{vj, vo,...,Vn} = E. 


The following theorem is analogous to Theorem 5.1 from Chapter 5 


Theorem 3.7. Let A: X — Y be an operator acting from one vector space 
to another. Then 


a) Ker A’ = (Ran A)" 
b) Ker A = (Ran A’)-; 
c) Ran A = (Ker A’)- 
d) Ran A’ = (Ker A)- 


Proof. First of all, let us notice, that since for a subspace E we have 
(E+)+ = E, the statements 1 and 3 are equivalent. Similarly, for the same 
reason, the statements 2 and 4 are equivalent as well. Finally, statement 2 is 
exactly statement 1 applied to the operator A’ (here we use the trivial fact 
fact that (A’)’ = A, which is true, for example, because of the corresponding 
fact for the transpose). 


So, to prove the theorem we only need to prove statement 1. 
Recall that A’: Y’ — X'. The inclusion y’ € (Ran A)+ means that y’ 
annihilates all vectors of the form Ax, i.e. that 
(Ax,y’)=0 VxeEX. 
Since (Ax, y’) = (x, A’y’), the last identity is equivalent to 
(x, A’y’)=0 VxeEX. 
But that means that A’y’ = 0 (A’y’ is a zero functional). 


So we have proved that y’ € (Ran A)+ iff A’y’ = 0, or equivalently iff 
y’ € Ker A’. 


Exercises. 

3.1. Prove that if for linear transformations T,7T, :X > Y 
(Tx, y') = (Tix, y’) 

for all x € X and for all y’ € Y’, then T= T;. 


Probably one of the easiest ways of proving this is to use Lemma 1.3. 


3.2. Combine the Riesz Representation Theorem (Theorem 2.1) with the reason- 
ing in Section 3.1.3 above to present a coordinate-free definition of the Hermitian 
adjoint of an operator in an inner product space. 
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The next problem gives a way to prove Proposition 3.6 


3.3. Let vi, V2,...,;Vn be a basis in X and let vj,v5,...,v), be its dual basis. Let 
E :=span{v1,Vo,...,V,}, 7 <n. Prove that E+ = span{V}41,...5Vyf- 


3.4. Use the previous problem to prove that for a subspace EF C X 


dim E + dim E+ = dim X. 


4. What is the difference between a space and its dual? 


We know that the dual space X’ has the same dimension as X, so the 
space and its dual are isomorphic. So one can think that really there is no 
difference between the space and its dual. However, as we discussed above 
in Section 1.1, when we change basis in the space X the coordinates in X 
and in X’ change according to different rules, see formula (1.1) above. 


On the other hand, using the natural isomorphism of X and X” we can 
say that X is the dual of X’. From this point of view, there is no difference 
between X and X’: we can start from X and say that X’ is its dual, or we 
can do it the other way around and start from X’. 

We already used this point of view above, for example in the proof of 
Theorem 3.7. 

Note also, that the change of coordinate formula (1.1) (see also the 
boxed statement below it) agrees with this point of view: if S := (S71), 
then (S-1)P = S, so we get the change of coordinate formula in X from the 
one in X’ by the same rule! 


4.1. Isomorphisms between X and X’. There are infinitely many pos- 
sibilities to define an isomorphism between X and X’. 

If X = F" then the most natural way to identify X and X’ is to identify 
the standard basis in F” with the one in (F”)’. In this case the action of a 
linear functional will be given by the “inner product type” expression 


(v,v') = (v’)?v. 


To generalize this to the general case one has to fix a basis B = by, ba,..., br 
in X and consider the dual basis B’ = b{, b4,...,b/,, and define an isomor- 
phism T : X + X' by Th; = bi, k=1,2,...,n. 
This isomorphism is natural in some sense, but it depends on the choice 
of the basis, so in general there is no natural way to identify X and X’. 
The exception to this is the case when X is a real inner product space: 


the Riesz representation theorem (Theorem 2.1) gives a natural way to iden- 
tify a linear functional with a vector in X. Note that this approach works 
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only for real inner product spaces. In the complex case, the Riesz rep- 
resentation theorem gives a natural identification of X and X’, but this 
identification is not linear but conjugate linear. 


4.2. An example: velocities (differential operators) and differen- 
tial forms as vectors and linear functionals. To illustrate the relations 
between vectors and linear functional, let us consider an example from mul- 
tivariable calculus, which gives rise to important ideas like tangent and 
cotangent bundles in differential geometry. 


Let us recall the notion of the path integral (of the second kind) from 
the calculus. Recall that a path y in R” is defined by its parameterization, 
ie. by a function 


tr x(t) = (21(t), ro(t),...,an(t))7 


acting from an interval [a,b] to R". If w is the so-called differential form 
(differential 1-form), 


w= filx)dxy + fo(x)daxe Sean fa(x)drn, 
the path integral 


fom f fides aden +... fadlan 
Y AY. 


is computed by substituting x(t) = (a1(t), vo(t),...,an(t))? in the expres- 
sion, i.e. is w is computed as 


b B a % 
if (Aen + folx(t))? 2(t) hoch pata! 0) ae 


dt 


In other words, at each moment t we have to evaluate the velocity 


yy - ox(t) _ (dealt) aera(t) day(t) \~ 
dt dt ’ dt ’” dt , 
apply to it the linear functional f = (f1, fo,..-, fn), £(v) = cee, fevp (here 
tk = fr(x(t)) but for a fixed t each f,, is just a number, so we simply write 
fx), and then integrate the result (which depends on t) with respect to t. 


4.2.1. Velocities as vectors. Let us fix t and analyze f(v). We will show that 
according to the rules of Calculus, the coordinates of v change as coordinates 
of a vector, and the coordinates of f as the coordinates of a linear functional 
(covector). Let us assume as it is customary in Calculus, that x, are the 
coordinates in the standard basis in R”, and let B = {bi,b2,..., bn} bea 
different basis in R”. We will use notation 7%, to denote the coordinates of 
a vector x = (#1, %2,...,%n)", ie. [x]g = (1, Fo,..., En)". 
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Let A = {ax,; }f,;-1 be the change of coordinates matrix, A = [I]g,s, so 
the new coordinates 7, are expressed in terms of the old ones as 


So the new coordinates 0, of the vector v are obtained from its old coordi- 
nates Up as 


4.2.2. Differential forms as linear functionals (covectors). Let us now cal- 
culate the differential form 


(4.1) w= S> Sedxp 
k=1 


in terms of new coordinates 7. The change of coordinates matrix from the 
new to the old ones is A~!. Let A7! = {Gx j hh j—1> SO 


Uk = thse and dx, = J Gp jk ;, i Le Deus the 
j=l j=l 
Substituting this into (4.1) we get 
n n 
w= SA ase 
k=1 j=l 


= fda 
j=l 
where 


n 
fy =o Gs fe: 
k=1 


But that is exactly the change of coordinate rule for the dual space! So 


according to the rules of Calculus, the coefficients of a differential 
1-form change by the same rule as coordinates in the dual space 


So, according to the accepted rules of Calculus, the coordinates of ve- 
locity v change as coordinates of a vector and coefficients (coordinates) of a 
differential 1-form change as the entries of a linear functional. In the differ- 
ential the set of all velocities is called the tangent space, and the set of all 
differential 1 forms is its dual and is called the cotangent space. 
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4.2.3. Differential operators as vectors. As we discussed above, in differen- 
tial geometry vectors are represented by velocities, i.e. by the derivatives 
dx(t)/dt. This is a simple and intuitively clear point of view, but sometimes 
it is viewed as a bit naive. 

More “highbrow” point of view, also used in differential geometry (al- 
though in more advanced texts) is that vectors are represented by a differ- 
ential operators 


a 
(4.2) D= DB,” 


The informal reason for that is the following. Suppose we want to compute 
the derivative of a function ® along the path given by the function t +> x(t), 
ie. the derivative 
d®(x(t)) 
dt 
By the Chain Rule, at a given time t 


d®(x(t)) a (0® baie 
dt PS on = a(t) = D® |, 


k=1 


where the differential operator D is given by (4.2) with vz = 2,(t). 

Of course, we need to show that the coefficient vu, of a differential form 
change according to the change of coordinate rule for vectors. This is in- 
tuitively clear, and can be easily shown by using the multivariable Chain 
Rule. We leave this as an exercise for the reader, see Problem 4.1 below. 


4.3. The case of a real inner product space. As we already discussed 
above, it follows from the Riesz Representation Theorem (Theorem 2.1) that 
a real inner product space X and its dual X’ are canonically isomorphic. 
Thus we can say that vectors and functionals live in the same space which 
makes things both simpler and more confusing. 


Remark. First of all let us note, that if the change of coordinates matrix S 
is orthogonal (S~! = $7), then (S~1)? = S. Therefore, for an orthogonal 
change of coordinate matrix the coordinates of a vector and of a linear 
functional change according to the same rule, so one cannot really see a 
difference between a vector and a functional. 


The change of coordinate matrix is orthogonal, for example, if we change 
from one orthonormal basis to another. 


4.3.1. Einstein notation, metric tensor. Let B = {b,bg,...,b, be a ba- 
sis in a real inner product space X and let B’ = {b{,b},...,b/,} be the 
dual basis (we identify the dual space X’ with X via Riesz Representation 
Theorem, so bj, can be assumed to be in X). 
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Here we present the standard in differential geometry notation (the so- 
called Einstein notation) for working with coordinates in these bases. Since 
we will only be working with coordinates, we can assume that we are working 
in the space R” with the non-standard inner product (-, - )g defined by the 
positive definite matrix G = {9j4} 7,21, 9j,6 = (be, bj),, which is often 
called the metric tensor, 


xX? 


(4.3) (xy) =(% yg = aan x,y € R" 
j=l k=1 


(see Section 5 in Chapter 7). 


To distinguish between vectors and linear functionals (co-vectors) it is 
agreed to write the coordinates of a vector with indices as superscripts and 
the coordinates a a linear functional with indices as subscripts: thus 2’, 
j =1,2,...,n denotes the coordinates of a vector x and fx, k = 1,2,...,n 
denotes the coundinates of a linear functional f. 


Remark. Putting indices as superscripts can be confusing, since one will 
need to distinguish it from the power. However, this is a standard and widely 
used notation, so we need to get acquainted with it. While I personally, 
like a lot of mathematicians, prefer using coordinate-free notation, all final 
computations are done in coordinates, so the coordinate notation has to be 
used. And as far as coordinate notations go, you will see that this notation 
is quite convenient to work with. 


Another convention in the Einstein notation is that whenever in a prod- 
uct the same index appear in the subscript and superscript, it means one 
needs to sum up in this index. Thus x/ f; means oF x) f;, So we can write 
f(x) = 2 f;. The same convention holds when we have more than one index 
of summation, so (4.3) can be rewritten in this notation as 


(4.4) (x,y) =gjer"y’, xy eR” 


(mathematicians are lazy and are always trying to avoid writing extra sym- 
bols, whenever they can). 


Finally, the last convention in the Einstein notation is the preservation 
of the position of the indices: if we do not sum over an index, it remains in 
the same eae (subscript or a, as it was before. Thus we can 
write yi = ee, but not fj = ale”, because the index j must remain as a 
superscript. 


Note, that to compute the inner product of 2 vectors, knowing their 
coordinates is not sufficient. One also needs to know the matrix G (which 
is often called the metric tensor). This agrees with the Einstein notation: 
if we try to write (x,y) as the standard inner product, the expression x,y 
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means just the product of coordinates, since for the summation we need the 
same index both as the subscript and the superscript. The expression (4.4), 
on the other hand, fit this convention perfectly. 

4.3.2. Covariant and contravariant coordinates. Lovering and raising the 
indices. Let us recall that we have a basis bi,b2,...,b, in a real inner 
product space, and that bi, b5,...,b/,, bj, € X is its dual basis (we identify 
X with its dual X’ via Riesz Representation Theorem, so bj, are in X). 


Given a vector x € X it can be represented as 


(4.5) x=) (x,bj,)b, =: Sa hy, and as 
k=1 k=1 

(4.6) x = J 0(x, be)b, =: 5) reds. 
k=1 k=1 


The coordinates x, are called the covariant coordinates of the vector x and 
the coordinates «* are called the contravariant coordinates. 


Now let us ask ourselves a question: how can one get covariant coordi- 
nates of a vector from the contravariant ones? 


According to the Einstein notation, we use the contravariant coordinates 
working with vectors, and covariant ones for linear functionals (i.e. when we 
interpret a vector x € X as a linear functional). We know (see (4.6)) that 
rE = (x, bg), so 


Le = (x, by) => ~~ w'b;, bx) => S > 2! (bj, by) = Soha 
J J j 


or in the Einstein notation 
Lk = Gk,jr'. 


In other words, 


the metric tensor G is the change of coordinates matrix from con- 
travariant coordinates x* to the covariant ones xp. 


The operation of getting from contravariant coordinates to covariant is 
called lowering of the indices. 


Note the following interpretation of the formula (4.4) for the inner prod- 
uct: as we know for the vector x we get its covariant coordinate as 7; = 
O7%2", so (x,y) = ayy. Similarly, because G is symmetric, we can say that 
Yk = 9; ny" and that (x,y) = 2*y,. In other words 


To compute the inner product of two vectors, one first needs to 
use the metric tensor G to lower indices of one vector, and then, 
treating this vector as a functional compute its value on the other 
vector. 


G-! is the metric 
tensor in covariant 
coordinates. 


238 8. Dual spaces and tensors 


Of course, we can also change from covariant coordinates x; to con- 
travariant ones 2/ (raise the indices). Since 


(@1,%2,...,2n)’ = G(a',2?,...,2")7, 
we get that 
(x1, a?,...,2")? = G71 (a1,20,...,2n)7 
so the change of coordinate matrix in this case is Go. 


Since, as we know, the change of coordinate matrix is the metric tensor, 
we can immediately conclude that G7! is the metric tensor in covariant 
coordinates, i.e. that if G7! = i ee then 


Gy) = 9 asin. 


Remark. Note, that if one looks at the big picture, the covariant and con- 
travariant coordinates are completely interchangeable. It is just the matter 
of which one of the bases in the dual pair B and B’ we assign to be the 
“primary” one and which one to be the dual. 


What to chose as a “primary” object, and what as the “dual” one de- 
pends mostly on accepted conventions. 


Remark 4.1. Einstein notation is usually used in differential, and espe- 
cially Riemannian geometry, where vectors are identified with velocities and 
covectors (linear functionals) with the differential 1-forms, see Section 4.2 
above. Vectors and covectors here are clearly different objects and form 
what is called tangent and cotangent spaces respectively. 


In Riemannian geometry one then introduces inner product (i.e. the 
metric tensor, if one thinks in terms of coordinates) on the tangent space, 
which allows us identify vectors and covectors (linear functionals). In coor- 
dinate representation this identification is done by lowering/raising indices, 
as described above. 


4.4. Conclusions. Let us summarize the above discussion on whether or 
not a space is different from its dual. 


In short, the answer is “Yes”, they are different objects. Although in the 
finite-dimensional case, which is treated in this book, they are isomorphic, 
nothing is usually gained from the identification of a space and its dual. 


Even in the simplest case of F” it is useful to think that the elements of 
F” are columns and the elements of its dual are rows (even though, when 
doing manipulations with the elements of the dual space we often put the 
rows vertically). More striking examples are ones considered in Sections 
1.4.1 and 1.4.2 dealing with Taylor formula and Lagrange interpolation. 
One can clearly see there that the linear functionals are indeed completely 
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different objects than polynomials, and that hardly anything can be gained 
by identifying functionals with the polynomials. 


For inner product spaces the situation is different, because such spaces 
can be canonically identified with their duals. This identification is linear 
for real inner product spaces, so a real inner product space is canonically 
isomorphic to its dual. In the case of complex spaces, this identification is 
only conjugate linear, but it is nevertheless very helpful to identify a linear 
functional with a vector and use the inner product space structure and ideas 
like orthogonality, self-adjointness, orthogonal projections, etc. 


However, sometimes even in the case of real inner product spaces, it is 
more natural to consider the space and its dual as different objects. For ex- 
ample, in Riemannian geometry, see Remark 4.1 above vector and covectors 
come from different objects, velocities and differential 1-forms respectively. 
Even though the introduction of the metric tensor allows us to identify 
vectors and covectors, it is sometimes more convenient to remember their 
origins think of them as of different objects. 


Exercises. 


4.1. Let D be a differential operator 


Show, using the chain rule, that if we change a basis and write D in new coordinates, 
its coefficients uv; change according to the change of coordinates rule for vectors. 


5. Multilinear functions. Tensors 
5.1. Multilinear functions. 


Definition 5.1. Let Vi, V2,...,Vp,V be vector spaces (over the same field 
F). A multilinear (p-linear) map with values in V is a function F’ of p vector 
variables v1, V2,...,Vp, Vk € Ve, with the target space V, which is linear in 
each variable v;. In other words, it means that if we fix all variables except 
vx we get a linear map, and this should be true for all k = 1,2,...,p. We 
will use the symbol L(Vi, V2,...,Vp;V) for the set of all such multilinear 
functions. 


If the target space V is the field of scalars F, we call F a multilinear 
functional, or tensor. The number p is called the valency of the multilinear 
functional (tensor). Thus, tensor of valency 1 is a linear functional, tensor 
of valency 2 is called a bilinear form. 
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Example. Let f, € (V;)’. Define a polylinear functional F = f, @f2@...@f, 
by multiplying the functionals fy, 


(5.1) fi @ fo @...@ f,(v1, V2, hae Vp) = fi (v1)fo(v2) oi .fp(Vp), 


for vy € Vg, k = 1,2,...,p. The polylinear functional f, © fh ®... @ fp is 
called the tensor product of functionals fy. 


5.1.1. Multilinear functions form vector space. Notice, that in the space 
L(V, V2,.--,Vp;V) one can introduce the natural operations of addition 
and multiplication by a scalar, 


(Fi + F2)(v1, va, nak Vp) = Fy(v1, v2, Lae Vp) + F (vi, v2, we Vp), 


(aF\)(v1, va, susits Vp) = aF (v1, ve, oshis Vp); 


where Fy, Fy € L(V, V2,...,Vpi;V), aE F. 
Equipped with these operations, the space L(V, V2,..., Vp; V) is a vector 
space. 


To see that we first need to show that F| + Fy and aF{ are multilinear 
functions. Since “multilinear” means that it is linear in each argument sepa- 
rately (with all the other variables fixed), this follows from the corresponding 
fact about linear transformation; namely from the fact that the sum of linear 
transformations and a scalar multiple of a linear transformation are linear 
transformations, cf. Section 4 of Chapter 1. 


Then it is easy to show that L(Vi,V2,...,Vp;V) satisfies all axioms of 
vector space; one just need to use the fact that V satisfies these axioms. We 
leave the details as an exercise for the reader. He/she can look at Section 
4 of Chapter 1, where it was shown that the set of linear transformations 
satisfies axiom 7. Literally the same proof work for multilinear functions; 
the proof that all other axioms are also satisfied is very similar. 

5.1.2. Dimension of L(Vi,V2,...,Vp;V). Let Bi, B2,...,Bp be bases in the 
spaces Vj, V2,...,V, respectively. Since a linear transformation is defined 
by its action on a basis, a multilinear function F € L(Vi,V2,...,Vp;V) is 
defined by its values on all tuples 

1 12 p k 

b;,,bj,,.-., bj, bj, € Be. 

Since there are exactly 

(dim V;) (dim Va) ... (dim V,) 
such tuples, and each F (bi, bs, et , bj) is determined by dim V coordi- 
nates (in some basis in V). we can conclude that F € L(Vi, V2,...,Vp; V) is 
determined by (dim V;)(dim V2)... (dim V,)(dim V) entries. In other words 


dim L(Vi, V2,..., Vp; V) = (dim V;)(dim V9)... (dim V,)(dim V). 
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in particular, if the target space is the field of scalars F (i.e. if we are dealing 
with multilinear functionals) 


dim L(V, V2,..., Vp; F) = (dim V1) (dim V2) ... (dim V,). 

It is easy to find a basis in L(Vi, V2,...,Vp;F). Namely, let for k = 
1,2,...,p the system B, = fh Ve be a basis in Vz and let B’ = ea rare 
be its dual system, bi eV. 

Proposition 5.2. The system 

bl ah2 ne F F 

bj, ®@ bj, ®...@ by, Ll<jp dim, k=1,2,...,p, 
is a basis in the space L(V, V2,...,Vp;F). 

Here bh ® be @...@ be is the tensor product of functionals, as defined 
in (5.1). 


Proof. We want to represent F as 


~1 ~o = 
(5.2) F => yy Fy jo, ivPj, ) bj, ® es ® bj, 


J1sJ2y-Ip 


Since b;(by) = 6;1, we have 


Tl ope EP (Hl p2 DP) _ 
(5.3) b;, @b;, @...@ bi (b;,,bj,,...,b;,)=1 and 
hl ope BP (Hl p2 Py 
(5.4) bj, @b;, @...@ bj (by, by... bi.) = 0 
for any collection of indices jj, 7,..., jp different from j1, j2,-.., Jp- 


Therefore, applying (5.2) to the tuple b! , b? s245 DR, we get 


1? go" 


= F(b} ,b%,,...,b” ), 


On Ju?“ J2? 0 Ip 


IJ2y+Jp 


so the representation (5.2) is unique (if exists). 


On the other hand, defining a), 5,5, = F(bj,; bi,, ak bj) and using 
(5.3) and (5.4), we can see that the equality (5.2) holds on all tuples of 
form bj, b%,, sed , bj. So decomposition (5.2) holds, so we indeed have a 


basis. 


5.2. Tensor Products. 


Definition. Let Vi, V2,..., V, be vector spaces. The tensor product 


Vi@V2®...@V,p 


of spaces V;, is simply the set L(Vj,V3,...,V,;F) of multilinear functionals; 
here V{ is the dual of Vj. 
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Remark 5.3. By Proposition 5.2 we get that if By = sea aa Ye is a basis 
in V;, for k = 1,2,...,p, then the system 


(5.5) bj, ® bj, @...@bF, 1<je<dim, k=1,2,...,p, 
is a basis in Vi; ®@ V2 @... @ Vp. 
Here we treat a vector vz € Vz as a linear functional on V,; the tensor 


product of vectors vj ® v2 ® ...® Vp is the defined according to (5.1). 


Remark. The tensor product v; ® v2 ® ...® Vp of vectors is clearly linear 
in each argument v;. In other words, the map (v1, v2,...,Vp) H V1 ®@ v2 ® 
...®@ Vp is a multilinear functional with values in V; ® V2 ®...@V,. We 
leave the proof as an exercise for a reader, see Problem 5.1 below 


Remark. Note, that the set {v1 ® v2 ®...@Vp : ve © Ve} of tensor 
products of vectors is strictly less than V; ® V2 @...@ Vp, see Problem 5.2 
below. 


5.2.1. Lifting a multilinear function to a linear transformation on the tensor 
product. 


Proposition 5.4. For any multilinear function F € L(V, V2,-.-,Vpi V) 
there exists a unique linear transformation T : Vj ® V2 ®...®V, > V 
extending F, i.e. such that 
(5.6) F (V1, V2,-+.;Vp) =T V1 @ v2 ®...@ Vp, 
for all choices of vectors vn © Ve, 1<k <p. 
Remark. If T : Vj @ V2 @...®V, — V is a linear transformation, then 
trivially the function F, 

F(V1,V2,-.-,;Vp) = Tv] @ v2 ®...® Vp, 


is a multilinear function in L(Vi,V2,...,Vp;V). This follows immediately 
from the fact that the expression vj ® v2 ®...@® Vp is linear in each variable 
Vk- 


Proof of Proposition 5.4. Define T on the basis (5.5) by 


1 2 a 1 2 P 
Th;, @b;, ®...@ bi, = F(b;,,bj,,..., bj.) 


and then extend it by linearity to all space Vi ® V2 @...® Vp. To complete 
the proof we need to show that (5.6) holds for all choices of vectors vz, € Vz, 
1<k <p (we now know that only when each vx is one of the vectors bi). 


To prove that, let us decompose vz as 


ve= > ak bk, he 1, 2524.25.05 
Jk 
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Using linearity in each variable vz we get 


_ 1.2 Ppl 2 Pp 
V1 @V2@...@Vp= ; Oj, Wjgs- ++, 05,Dj, @ bj, @...@b;,, 


FisJ2s-Ip 

= 1 2 p 1 42 p 
F'(v1,V2,---;Vp) = y Oj, g++ + 05, (D5, 5, +++, Dy, 

FJisJ2s Ip 


so by the definition of T identity (5.6) holds. 


5.2.2. Dual of a tensor product. As one can easily see, the dual of the tensor 
product Vi @V2®...@Vp is the tensor product of dual spaces V/@Vj®.. .QV,. 

Indeed, by Proposition 5.4 and remark after it, there is a natural one-to- 
one correspondence between multilinear functionals in L(Vi, V2,...,Vp, F) 
(i.e. the elements of V/ ® Vj @...®@ V;/) and the linear transformations T : 
Vi@V2®...@V, > F (ie. with the elements of the dual of Vi ®@V2®...@Vp). 


Note, that the bases from Remark 5.3 and Proposition 5.2 are the dual 
bases (in Vj @ V2 @...@V, and V/ ® Vj ®...@V, respectively). Knowing 
the dual bases allows us easily calculate the duality between the spaces 
Vi @W%@...@V, and Vi @V@...@ Vy, ie. the expression (x,x’), x € 
Vi@V2®...@Vy, x EVi@VY@...@Vi 


5.3. Covariant and contravariant tensors. Let X1, X2,...,Xp be vec- 
tor spaces, and let Vy; be either X, or Xj, k =1,2,...,p. For a multilinear 
function F € L(Vi, V2,...,Vp;V) we say that that it is covariant in variable 
vi © Vp if Vi = Xy and contravariant in this variable if Vi, = Xj. 

If a multilinear function is covariant (contravariant) in all variables, we 
say that the multilinear function is covariant (contravariant). In general, if 
a function is covariant in r variables and contravariant in s variables, we say 
that the multilinear function is r-covariant s-contravariant (or simply (r,s) 
multilinear function, or that its valency is (r,s)). 


Thus, a linear functional can be interpreted as 1-covariant tensor (recall, 
that we use the word tensor for the case of functionals, i.e. when the target 
space is the field of scalars F). By duality, a vector can be interpreted as 
1-contravariant tensor. 


Remark. At first the terminology might look a bit confusing: if a variable 
is a vector (not a functional), it is a covariant variable but a contravariant 
object. But notice, that we did not say here a “covariant variable”: we said 
that if vz € Xz then the mulitilinear function is covariant in the variable 
vz. So, the covariant object is not vz, but the “slot” in the tensor where we 
put it! 

So there is no contradiction, we put the contravariant objects into co- 
variant slots and vice versa. 


244 8. Dual spaces and tensors 


Sometimes, slightly abusing the language, people talk about covariant 
(contravariant) variables or arguments. But it is usually meant that the 
corresponding “slots” in the tensor are covariant (contravariant), and not 
the variables as objects. 


5.3.1. Linear transformations as tensors. A linear transformation T : X, > 
X» can be interpreted as 1-covariant 1-contravariant tensor. Namely, the 
bilinear functional F’, 


/ U / 4 
F(x1,%2) = (TX1,%2), x1 € X1, x2 € XQ 
is covariant in the first variable x; and contravariant in the second one x%. 


Conversely, 


Proposition 5.5. Given a 1-1 tensor F € L(X1,X4;F), there exists a 
unique linear transformation T : X, — X2q such that 


(5.7) F(x1, 9) = (TX1,X9), 
for all x, € X2, xh € X5. 


Proof. First of all note, that the uniqueness is a trivial corollary of Lemma 
1.3, cf. Problem 3.1 above. So we only need to prove existence of T’. 


Let By = (oe be a basis in X;,, and let By, = ee be the 
dual basis in X;, k = 1,2. Then define the matrix A = {a,,; ae eee 
by 

ay, = F(b}, bR). 
Define T to be the operator with matrix [T'],,3, = A. Clearly (see Remark 
1.5) 


(5.8) (Tb}, bz) = a;,; = F(b}, bz) 


which implies the equality (5.7). This can be easily seen by decomposing 
X1 = DJ, ajb; and x) = )7, 4b), and using linearity in each argument. 

Another, more high brow explanation is that the tensors in left and the 
right sides of (5.7) coincide on a basis in X; ®@ X4 (see Remark 5.3 about the 
basis), so they coincide. To be more precise, one should lift the bilinear forms 
to the linear transformations (functionals) X; ® X5 — F (see Proposition 
5.4), and since the transformations coincide on a basis, they are equal. 

One can also give an alternative, coordinate-free proof of existence of T, 
along the lines of the coordinate-free definition of the dual space (see Section 
3.1.3). Namely, if we fix x1, the function F'(x,,x) is a linear in x4, so it is 
a linear functional on X4, i.e. a vector in Xo. 

Let us call this vector T(x). So we defined a transformation T : X, > 
Xg. One can easily show that T is a linear transformation by essentially 
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repeating the reasoning from Section 3.1.3. The equality (5.7) follows au- 
thomatically from the definition of T. 


Remark. Note that we also can say that the function F' from Proposition 
5.5 defines not the transformation T, but its adjoint. Apriori, without as- 
suming anything (like order of variables and its interpretation) we cannot 
distinguish between a transformation and its adjoint. 


Remark. Note, that if we would like to follow the Einstein notation, the 
entries a,j, of the matrix A = [T]g,,p, of the transformation T should be 


written as aj. Then if x", k = 1,2,...,dim X, are the coordinates of the 
vector x € X1, the jth coordinate of y = Tx is given by 
y = ahak : 


Recall the here we skip the sign of summation, but we mean the sum over 
k. Note also, that we preserve positions of the indices, so the index j stays 
upstairs. The index & does not appear in the left side of the equation because 
we sum over this index in the right side, and its got “killed”. 

Similarly, if 7;, 7 = 1,2,...,dim X» are the coordinates of the vector 
x’ € X4, then kth coordinate of y’ := T’x’ is given by 

Yk = A, 5 
(again, skipping the sign of summation over j). Again, since we preserve 
the position of the indices, so the index k in yz is a subscript. 

Note, that since x € X, and y = Tx € X2 are vectors, according to the 
conventions of the Einstein notation, the indices in their coordinates indeed 
should be written as superscripts. 

Similarly, x’ € X4 and y’ = T’x’ € X{ are covectors, so indices in their 
coordinates should be written as subscripts. 

The Einstein notation emphasizes the fact mentioned in the previous 
remark, that a 1-covariant 1-contravariant tensor gives us both a linear 
transformation and its adjoint: the expression az” gives the action of T, 
and aj.x; gives the action of its adjoint T’. 


5.3.2. Polylinear transformations as tensors. More generally, any polylin- 
ear transformation can be interpreted as a tensor. Namely, given a poly- 
linear transformation F € L(Vi,V2,...,Vp;V) one can define the tensor 
F € L(Vj, Va,.--; Vp, VF) by 


(5.9) F(v1, Vo, 1.25 Vp;, V’) = (F(v1, V2,---;Vp), Vv’), vi C Vigv EV’. 


Conversely, 


246 8. Dual spaces and tensors 


Proposition 5.6. Given a tensor F € L(V,, Va,...,Vp, V's F) there exists a 
unique polylinear transformation F € L(Vi, V2,...,Vp;V) such that (5.9) is 
satisfied. 


Proof. By Proposition 5.4 the tensor F can be extended to a linear trans- 
formation (functional) T : Vi @ V2 ®@...@ Vp ®V' > F such that 


F (v1, V9, -+-;Vp; V’) = T(v1 @ V2 @ ... @ Vp @V’/) 


for all vz; € Vy, v’' EV". 
IfwewW :=V,@®V,@...@V, and v’ € V’, then 


wev EV @h®...@V,@V". 
So, we can define a bilinear functional (tensor) G € L(W, V’;F) by 
G(w, v’) := T(w@ v). 


By Proposition 5.5, G gives rise to a linear transformation, i.e. there exists 
a unique linear transformation T : W — V such that 


G(w, v’) = (Tw, v’) Yew, YWiev’. 
And the linear transformation T' gives us the polylinear map 
Fe L(V, V2,...,Vps V) 
by 
F (v1, V2,---,Vp) = T(v1 @ V2 ®...@ Vp), 


see Remark after Proposition 5.4. 


The uniqueness of the transformation F’, is, as in Proposition 5.5, is a 
trivial corollary of Lemma 1.3. We leave the details as an exercise for the 
reader. 


This section shows that 


tensors are universal objects in polylinear algebra, since any poly- 
linear transformation can be interpreted as a tensor and vice versa. 


Exercises. 


5.1. Show that the tensor product v; ® v2 @ ...® vy, of vectors is linear in each 
argument vz. 


5.2. Show that the set {v1 ®@v2®...@vp : vz € Vi} of tensor products of vectors 
is strictly less than V; ®@ V2 ®...@ Vp. 


5.3. Prove that the transformation F’ from Proposition 5.6 is unique. 
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6. Change of coordinates formula for tensors. 


The main reason for the differentiation of covariant and contravariant vari- 
ables is that under the change of bases, their coordinates change according 
to different rules. Thus, the entries of covariant and contravariant vectors 
change according to different rules as well. 


In this section we going to investigate this in details. Note, that coor- 
dinate representations are extremely important, since, for example, all nu- 
merical computations (unlike the theoretical investigations) are performed 
using some coordinate system. 


6.1. Coordinate representation of a tensor. Let F' be an r-covariant 
s-contravariant tensor, r +s = p. Let x,,...,x, be covariant variables 
(x, € Xx), and f),...,f, be the contravariant ones (f,; € Xj). Let us 
write the covariant variables first, so the the tensor will be wake as 
F(x1,...,X,,f,,...,f£,). For k = 1,2,...,p fix a basis B, = {pb ae 
in X,, and let By, = {b\” i be the dual basis in X/. 


For a vector x, € X, let xy j =1,2,...,dim X;, be its coordinates in 


(k)’ 
the basis By, and similarly, if f, € X{ let f, j = 1,2,...,dim X, be its 
coordinates in the dual basis B/, (note that in agreement with the Einstein 
notation, the coordinates of the vector are indexed by a superscript, and the 
coordinate of a covector re indexed by a subscript). 
Proposition 6.1. Denote 

ky ,.5ks (1 _(rtl E(r+ 
(6.1) CaF ah Be ok 
Then, in the Einstein notation 


ky, ks 5 rp (h 
(6.2) F(x1,..-)Xryfi,-- 65 fs) = pera) - ee aunt oe age 
(the summation here is over the indices j1,...,jr and ki,...,ks). 


Note that we use the notation (1),...,(r) and (1),...,(s) to emphasize 
that these are not the indices: the numbers in parenthesis just show the 
order of argument. Thus, right side of (6.2) does not have any indices left 
(all indices were used in summation), so it is just a number (for fixed x,s 
and f;,s). 


Proof of Proposition 6.1. To show that (6.1) implies (6.2) we first notice 
that (6.1) means that (6.2) hods when x;s and fxs are the elements of the 
corresponding bases. Decomposing each argument x; and f;, in the corre- 
sponding basis and using linearity in each argument we can easily get (6.2). 
The computation is rather simple, but because there are a lot of indices, the 
formulas could be quite big and could look quite frightening. 


248 8. Dual spaces and tensors 


To avoid writing too many huge formulas, we leave this computation to 
the reader as an exercise. 


We do not want the reader to feel cheated, so we present a different, 
more “high brow” (abstract) explanation, which does not require any com- 
putations! Namely, let us notice that the expressions in the left and the 
right side of (6.2) define tensors. By Proposition 5.4 they can be lifted to 
linear functionals on the tensor product X1 ®...® X; @X/4,@...@ Xf4.. 


Rephrasing what we discussed in the beginning of the proof, we can say 
that (6.1) means that the functional coincide on all vectors 


(Y) ©) ga pert) pts) 
bj @.-.@b;” @bs” @... @by. 


dL ir 


of a basis in the tensor product, so the functionals (and therefore the tensors 
are equal. 


The entries yj"; 


By, k =1,2,...,p. 

Now, let for k = 1,2,...p, A, be a basis in X, (and Aj, be the dual 
basis in Xj). We want to investigate how the entries of the tensor F' change 
when we change the bases from By to Ax. 


are called the entries of the tensor F’ in the bases 


6.2. Change of coordinate formulas in Einstein notation. Let us 
first consider the familiar cases of vectors and linear functionals, considered 
above in Section 1.1.1 but write everything down using the Einstein notation. 
Let we have in X two bases, B and A and let 


A=[Alas 


be the change of coordinates matrix from B to A. For a vector x € X let 
x* be its coordinates in the basis B and x" be the coordinates in the basis 
A. Similarly, for f € X’ let f, denote the coordinates in the basis B’ and 
fy-the coordinates in the basis A’ (B’ and A’ are the dual bases to B and 
A respectively). 


Denote by (A) the entries of the matrix A: to be consistent with the 
Einstein notation the superscript 7 denotes the number of the row. Then 
we can write the change of coordinate formula as 


(6.3) # = (Aja. 
Similarly, let (A71)f be the entries of A~!: again superscript is used to 


denote the number of the row. Then we can write the change of coordinate 
formula for the dual space as 


(6.4) faa ig 
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the summation here is over the index k (i.e. along the columns of A~!), so 
the change of coordinate matrix in this case is indeed (A7!)”. 


Let us emphasize that we did not prove anything here: we only rewrote 
formula (1.1) from Section 1.1.1 using the Einstein notation. 


Remark. While it is not needed in what follows, let us play a bit more with 
the Einstein notation. Namely, the equations 


A+A=I and AA?l=I 
can be rewritten in the Einstein notation as 


(A)L(A7)E = 651 and (A71)R(A}} = bk, 


respectively. 


6.3. Change of coordinates formula for tensors. Now we are ready 
to give the change of coordinate formula for general tensors. 

For k = 1,2,...,p:=r+s let Aj := [I] 4,p be the change of coordinates 
matrices, and let A;* be their inverses. 


As in Section 6.2 we denote by (A) the entries of a matrix A, with the 
agreement that superscript gives the number of the column. 


Proposition 6.2. Given an r-covariant s-contravariant tensor F let 


ki ,..5ks akiroks 


i ae 


be its entries in the bases By (the old ones) and A, (the new ones) respec- 
tively. In the above notation 


~ky,...,k. ore 41x91 aaah k ‘ 
Opie = gern AL YR (Ap Ye (Arete «++ (Ants) 


(the summation here is in the indices j},...,j). and ki,...,ks). 


Because of many indices, the formula in this proposition looks very com- 
plicated. However if one understands the main idea, the formula will turn 
out to be quite simple and easy to memorize. 


To explain the main idea let us, sightly abusing the language, express 
this formula “in plain English”. namely, we can say, that 


To express the “new” tensor entries GP eit in terms of the “old” 
ones gphtecke | one needs for each covariant index (subscript) apply 


Leeodr 


the covariant rule (6.4), and for each contravariant index (super- 
script) apply the contravariant rule (6.3) 


Proof of Proposition 6.2. Informally, the idea of the proof is very simple: 
we just change the bases one at a time, applying each time the change 
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of coordinate formulas (6.3) or (6.4), depending on whether the tensor is 
covariant or contravariant in the corresponding variable. 


To write the rigorous formal proof we will use the induction in r ans 
s (the number of covariant and contravariant arguments of the tensor). 
Proposition is true for r = 1, s = 0 and for r = 0, s = 1, see (6.4) or 
(6.3) respectively. 

Assuming now that the proposition is proved for some p and s, let us 
prove it for r+1, s and for r, s+1. 


Let us do the latter case, the other one is done similarly. The maid idea 
is that we first change p = r+ bases and use the induction hypothesis; 
then we change the last one and use (6.3). 


Namely, let oe be the entries of an (r,s+1) tensor F in the bases 


Lieedr 


Al,. oe »Ap, Bosi, pH=rrs. 
Let us fix the index k,41 and consider the r-covariant s-contravariant 


- 1 
tensor Pip. dies figasts fe; Oe )), where xj,...,X,;,f;,...,f, are the 
variables. Clearly 
k1,-ks,ks41 aki, ks ks41 
Pirpusie and Pin adr 
are its entries in the bases B,,...,B, and Aj,...,Ap respectively (can you 


see why?) Recall, that the index k,+, here is fixed. 
By the induction hypothesis 


aki sks kes ky, ky ks -1y)J1 -1)\s, k ks 
Gn 2K ee “Cr ae, \i (Ansa) a -+ (Arts) gi. 
Note, that we did not assume anything about the index ks+1, so (6.5) holds 
for all ks41. 


Now let us fix indices j1,...,j,,k1,...,k4s; and consider 1-contravariant 
tensor 
1 s(rtl x(r+ 
FO) a ar ec © ia) 


of the variable f;41. Here a‘*) are the vectors in the basis Ap, and ay” are 
the vectors in the dual basis Aj. 


It is again easy to see that 


~k1,..;ks,Ks41 ~vk1,..sksjhs41 
Hi ade and Pie 
Jeti = 1,2,...,dim X41, are the indices of this functional in the bases Bp+1 


and A,+1 respectively. According to (6.3) 


whi ks skep1 — aRivsks hoya ks41 
Pir yeadr Pit seeder (Avti)ai > 

and since we did not assume anything about the indices j1,..., jr, ki,...,Ks, 
the above identity holds for all their combinations. Combining this with 


(6.5) we get that the proposition holds for tensors of valency (r,s +1). 
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The case of valency (r + 1,8) is treated absolutely the same way: the 
only difference is that in the end we get a 1-covariant tensor and use (6.4 
instead of (6.3). 


Chapter 9 


Advanced spectral 
theory 


1. Cayley—Hamilton Theorem 


Theorem 1.1 (Cayley—Hamilton). Let A be a square matrix, and let p(A) = 
det(A — AI) be its characteristic polynomial. Then p(A) = 0. 


A wrong proof. The proof looks ridiculously simple: plugging A instead 
of A in the definition of the characteristic polynomial we get 


p(A) = det(A — AI) = detO = 0. 


But this is a wrong proof! To see why, let us analyze what the theorem 
states. It states, that if we compute the characteristic polynomial 
n 
det(A — AI) = p(A) = So ce d* 
k=0 
and then plug matrix A instead of X to get 
n 
p(A) := So ce A® =col+cAt+...+@ A” 
k=0 


then the result will be zero matrix. 


It is not clear why we get the same result if we just plug A instead of 
A in the determinant det(A — AI). Moreover, it is easy to see that with 
the exception of trivial case of 1 x 1 matrices we will get a different object. 
Namely, A — AI is zero matrix, and its determinant is just the number 0. 
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This proof  illus- 
trates an important 
idea that often it 
is sufficient to con- 
sider only a typical, 
generic situation. 
It is going beyond 
the scope of the 
book, but let us 
mention, without 
going into details, 
that a generic 
(i.e. typical) matrix 
is diagonalizable. 
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But p(A) is a matrix, and the theorem claims that this matrix is the zero 
matriz. Thus we are comparing apples and oranges. Even though in both 
cases we got zero, these are different zeroes: the number zero and the zero 
matrix! 


Let us present another proof, which is based on some ideas from analysis. 


A “continuous” proof. The proof is based on several observations. First 
of all, the theorem is trivial for diagonal matrices, and so for matrices similar 
to diagonal (i.e. for diagonalizable matrices), see Problem 1.1 below. 


The second observation is that any matrix can be approximated (as close 
as we want) by diagonalizable matrices. Since any operator has an upper 
triangular matrix in some orthonormal basis (see Theorem 1.1 in Chapter 
6), we can assume without loss of generality that A is an upper triangular 
matrix. 


We can perturb diagonal entries of A (as little as we want), to make them 
all different, so the perturbed matrix Ais diagonalizable (eigenvalues of a a 
triangular matrix are its diagonal entries, see Section 1.7 in Chapter 4, and 
by Corollary 2.3 in Chapter 4 an n x n matrix with n distinct eigenvalues 
is diagonalizable). 

As I just mentioned, we can perturb the diagonal entries of A as little 
as we want, so Frobenius norm ||A — All2 is as small as we want. Therefore 
one can find a sequence of diagonalizable matrices A, such that A, > A as 
k + oo for example such that ||A, — All2 + 0 as k > oo). It can be shown 
that the characteristic polynomials px(A) = det(A; — AZ) converge to the 
characteristic polynomial p(A) = det(A — AJ) of A. Therefore 


p(A) = lim px(Ax)- 
k-> 00 


But as we just discussed above the Cayley-Hamilton Theorem is trivial for 
diagonalizable matrices, so ppy(Ax,) = 0. Therefore p(A) = limg..0 = 
0. 


This proof is intended for a reader who is comfortable with such ideas 
from analysis as continuity and convergence’. Such a reader should be able 
to fill in all the details, and for him/her this proof should look extremely 
easy and natural. 


However, for others, who are not comfortable yet with these ideas, the 
proof definitely may look strange. It may even look like some kind of cheat- 
ing, although, let me repeat that it is an absolutely correct and rigorous 
proof (modulo some standard facts in analysis). So, let us present another, 


lHere I mean analysis, i.e. a rigorous treatment of continuity, convergence, etc, and not 
calculus, which, as it is taught now, is simply a collection of recipes. 
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proof of the theorem which is one of the “standard” proofs from linear al- 
gebra textbooks. 


A “standard” proof. We know, see Theorem 6.1.1 from Chapter 6, that 
any square matrix is unitary equivalent to an upper triangular one. Since for 
any polynomial p we have p(UAU~!) = Up(A)U™!, and the characteristic 
polynomials of unitarily equivalent matrices coincide, it is sufficient to prove 
the theorem only for upper triangular matrices. 

So, let A be an upper triangular matrix. We know that diagonal entries 
of a triangular matrix coincide with it eigenvalues, so let 1, A2,..-,An be 
eigenvalues of A ordered as they appear on the diagonal, so 


At * 


0 An 
The characteristic polynomial p(z) = det(A — zJ) of A can be represented 


as p(z) = (A1 — 2)(A2 — 2)... (An — 2) = (-1)"(2 — A1) (2 — Az)... (2 — An), 


p(A) = (-1)"(A— Mi D)(A — Aol)... (A— And). 


Define subspaces Ey := span{e;,e2,...,ex}, where e1,e2,...,€, is the 
standard basis in C”. Since the matrix of A is upper triangular, the sub- 
spaces Ey, are so-called invariant subspaces of the operator A, i.e. AE, C Ep 
(meaning that Av € E; for all v € E;,). Moreover, since for any v € E;, and 
any 


(A— Al)v = Av — Ave Ex, 


because both Av and Av are in Fy. Thus (A — AL) Ex C Eg, ie. Ey is an 
invariant subspace of A — XI. 


We can say even more about the the subspace (A — AzJI)E,. Namely, 
(A — AgDex € span{e;,e2,...,e,-1}, because only the first k — 1 entries 
of the kth column of the matrix of A — A,J can be non-zero. On the other 
hand, for j < k we have (A — Az )e; € Ej C Ex (because Ej is an invariant 
subspace of A — A,J). 

Take any vector v € Ex. By the definition of Ex, it can be repre- 
sented as a linear combination of the vectors e1, e2,...,e,. Since all vectors 
€1,€9,...,ex are transformed by A — AzJ to some vectors in Ex_1, we can 
conclude that 


(1.1) (A _ Agl)v € Ex_y Vv € Ex. 
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Take an arbitrary vector x € C” = E,. Applying (1.1) inductively with 
k=n,n-—1,...1 we get 


xX, := (A _ Ant)x € En-1, 
XQ:= (A = An—11)x1 = (A = An—11)(A = Ant)x € Ey_2, 


Xn = (A — AQI)Xp_1 = (A— dl)... (A= AnD) (A - Ant) x € Ey. 
The last inclusion mean that x, = ae;. But (A — A;J)e; = 0, so 
0 = (A— Ai I)xn = (A— ALT)(A = Aol)... (A = Ant)x. 

Therefore p(A)x = 0 for all x € C”, which means exactly that p(A) = 0. 


Exercises. 


1.1 (Cayley-Hamilton Theorem for diagonalizable matrices). As discussed in the 
above section, the Cayley-Hamilton theorem states that if A is a square matrix, 
and 


p(d) = det(A — AL) = So cx A* 
k=0 


is its characteristic polynomial, them p(A) := peer cr A® = 0 (we assuming, that 
by definition A°® = J). 

Prove this theorem for the special case when A is similar to a diagonal matrix, 
A=SDS"!. 

Hint: If D = diag{Aj, A2,..., An} and p is any polynomial, can you compute 
p(D)? What about p(A)? 
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2. Spectral Mapping Theorem 


2.1. Polynomials of operators. Let us also recall that for a square ma- 
trix (an operator) A and for a polynomial p(z) = ae azz" the operator 
p(A) is defined by substituting A instead of the independent variable, 


N 
p(A) = So ayA* = al tajA+a,A?+...+ayAN; 
k=1 


here we agree that A° = J. 

We know that generally matrix multiplication is not commutative, i.e. 
generally AB 4 BA so the order is essential. However 

AP AI = Al AP = ARts, 
and from here it is easy to show that for arbitrary polynomials p and q 
p(A)q(A) = q(A)p(A) = R(A) 

where R(z) = p(z)q(z). 

That means that when dealing only with polynomials of an operator 
A, one does not need to worry about non-commutativity, and act like A is 
simply an independent (scalar) variable. In particular, if a polynomial p(z) 
can be represented as a product of monomials 

p(z) = a(z — 21) (z — 22)... (2 — zn), 

where 21, 22,...,2y are the roots of p, then p(A) can be represented as 


p(A) = a(A— 21)(A — zl)... (A- zn) 


2.2. Spectral Mapping Theorem. Let us recall that the spectrum o(A) 
of a square matrix (an operator) A is the set of all eigenvalues of A (not 
counting multiplicities). 


Theorem 2.1 (Spectral Mapping Theorem). For a square matrix A and an 
arbitrary polynomial p 

o(p(A)) = p(o(A)). 
In other words, 2 is an eigenvalue of p(A) if and only if u = p(A) for some 
eigenvalue X of A. 


Note, that as stated, this theorem does not say anything about multi- 
plicities of the eigenvalues. 


Remark. Note, that one inclusion is trivial. Namely, if \ is an eigenvalue of 
A, Ax = dx for some x ¥ 0, then A*x = "x, and p(A)x = p(A)x, so p(A) 
is an eigenvalue of p(A). That means that the inclusion p(o(A)) C o(p(A)) 
is trivial. 
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If we consider a particular case js = 0 of the above theorem, we get the 
following corollary. 


Corollary 2.2. Let A be a square matrix with eigenvalues 1, 2,.--,;An 
and let p be a polynomial. Then p(A) is invertible if and only if 
P(A) #0 VR = 12) stags 


Proof of Theorem 2.1. As it was discussed above, the inclusion 
p(o(A)) C o(p(A)) 
is trivial. 


To prove the opposite inclusion o(p(A)) C p(a(A)) take a point py € 


a(p(A)). Denote q(z) = p(z) — #, so g(A) = p(A) — pT. Since p € o(p(A)) 
the operator q(A) = p(A) — pl is not invertible. 


Let us represent the polynomial q(z) as a product of monomials, 
q(z) = a(z — 21)(z — 22)...(2— zy). 
Then, as it was discussed above in Section 2.1, we can represent 
g(A) = a(A — 21)(A = zoel)... (A- zn). 
The operator qg(A) is not invertible, so one of the terms A — z,J must be 


not invertible (because a product of invertible transformations is always 
invertible). That means zz € o(A). 


On the other hand z, is a root of qg, so 
0 = q(zx) = plz) — # 
and therefore js = p(z,). So we have proved the inclusion o(p(A)) C p(o(A)). 


Exercises. 


2.1. An operator A is called nilpotent if A* = 0 for some k. Prove that if A is 
nilpotent, then o(A) = {0} (ie. that 0 is the only eigenvalue of A). 


Can you do it without using the spectral mapping theorem? 
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3. Generalized eigenspaces. Geometric meaning of algebraic 
multiplicity 


3.1. Invariant subspaces. 


Definition. Let A : V — V be an operator (linear transformation) in a 
vector space V. A subspace FE of the vector space V is called an invariant 
subspace of the operator A (or, shortly, A-invariant) if AE C E, ie. if 
Av € E for all v € E. 


If £ is A-invariant, then 
A°’E = A(AE) C AECE, 


i.e. E is A2-invariant. 


Similarly one can show (using induction, for example), that if AE Cc E 
then 


A BCE Wk>1. 
This implies that P(A)E C E for any polynomial p, i.e. that: 


any A-invariant subspace E is an invariant subspace of p(A). 


If EF is an A-invariant subspace, then for all v € E the result Av also 
belongs to E. Therefore we can treat A as an operator acting on FE, not on 
the whole space V. 


Formally, for an A-invariant subspace FE’ we define the so-called restric- 

tion Alp: E > E of A onto E by 
(Alz)v = Av Wek. 

Here we changed domain and target space of the operator, but the rule 
assigning value to the argument remains the same. 

We will need the following simple lemma 
Lemma 3.1. Let p be a polynomial, and let E be an A-invariant subspace. 
Then 

p(Alz) = p(A)lz- 


Proof. The proof is trivial 


If Ey, £g,...,E, a basis of A-invariant subspaces, and A, := Alp, are 
the corresponding restrictions, then, since AF, = A, Ey C Ex, the operators 
A, act independently of each other (do not interact), and to analyze action 
of A we can analyze operators A; separately. 
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In particular, if we pick a basis in each subspace E, and join them to 
get a basis in V (see Theorem 2.6 from Chapter 4) then the operator A will 
have in this basis the following block-diagonal form 


Ay O 


Ap 


0 “4 


(of course, here we have the correct ordering of the basis in V, first we take 
a basis in £,then in Fy and so on). 

Our goal now is to pick a basis of invariant subspaces Fj, Fo,..., E, 
such that the restrictions A, have a simple structure. In this case we will 
get a basis in which the matrix of A has a simple structure. 

The eigenspaces Ker(A — A; J) would be good candidates, because the 
restriction of A to the eigenspace Ker(A—A,J) is simply A,J. Unfortunately, 
as we know eigenspaces do not always form a basis (they form a basis if and 
only if A can be diagonalized, cf Theorem 2.1 in Chapter 4. 


However, the so-called generalized eigenspaces will work. 


3.2. Generalized eigenspaces. 
Definition 3.2. A vector v is called a generalized eigenvector (correspond- 
ing to an eigenvalue 2) if (A — AI)*v = 0 for some k > 1. 


The collection EF) of all generalized eigenvectors, together with 0 is called 
the generalized eigenspace (corresponding to the eigenvalue A. 


In other words one can represent the generalized eigenspace FE) as 
(3.1) Ey = (J Ker(A—- An). 
k>1 
The sequence Ker(A — AI)*, k = 1,2,3,... is an increasing sequence of 
subspaces, i.e. 


Ker(A — AI)* c Ker(A—AD**1 VE > 1. 


The representation (3.1) does not look very simple, for it involves an in- 
finite union. However, the sequence of the subspaces Ker(A—AJ)* stabilizes, 
i.e. 

Ker(A — AI)* = Ker(A— AI)*t! VE > ky, 
so, in fact one can take the finite union. 


To show that the sequence of kernels stabilizes, let us notice that if for 
finite-dimensional subspaces E’ and F we have E & F (symbol FE & F means 
that EC F but E # F), then dimE < dim F’. 
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Since dim Ker(A — AJ)* < dimV < oo, it cannot grow to infinity, so at 
some point 


Ker(A— AD)* = Ker(A— arr". 
The rest follows from the lemma below. 
Lemma 3.3. Let for some k 
Ker(A — AI)* = Ker(A — AI)*+1. 
Then 


Ker(A — AI)**" = Ker(A— AD)RtT+1 Wr > 0. 


Proof. Let v € Ker(A — AD)*t"+1, ie. (A — AD)*tT+1y = 0. Then 
w := (A— AI)" € Ker(A— AD)**. 


But we know that Ker(A — \Z)* = Ker(A — AI)**1 so w € Ker(A— AI)‘, 
which means (A — AI)*w = 0. Recalling the definition of w we get that 


(A—AD)FtTy = (A— ADFw =0 


so v € Ker(A—AI)**+". We proved that Ker(A—AI)*+"+1 C Ker(A—AI)**". 
The opposite inclusion is trivial. 


Definition. The number d = d(\) on which the sequence Ker(A — J)* 
stabilizes, i.e. the number d such that 


Ker(A — AI)" & Ker(A — AI)4 = Ker(A — A1)**1 
is called the depth of the eigenvalue X. 


It follows from the definition of the depth, that for the generalized 
eigenspace Ey 
(3.2) (A-ADN4%v=0 We). 

Now let us summarize, what we know about generalized eigenspaces. 

a) EF) is an invariant subspace of A, AF) C EB). 

b) If d(,) is the depth of the eigenvalue , then 

((A — AD)|n, 2 = (Ale, ~ Mn) = 0. 
(this is just another way of writing (3.2)) 
c) o(Alz,) = {A}, because the operator A|z, — Alp, , is nilpotent, see 


2, and the spectrum of nilpotent operator consists of one point 0, 
see Problem 2.1 


Now we are ready to state the main result of this section. Let A: V > V. 
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Theorem 3.4. Let o(A) consists of r points 1, A2,...,Ar, and let Ey := 
E), be the corresponding generalized eigenspaces. Then the system of sub- 
space Fi, Eo,..., FE, is a basis of subspaces in V. 


Remark 3.5. If we join the bases in all generalized eigenspaces E;,, then 
by Theorem 2.6 from Chapter 4 we will get a basis in the whole space. 
In this basis the matrix of the operator A has the block diagonal form 
A = diag{ Ai, Ag,...,A,}, where Ay := Alp,, Ex = Ey,. It is also easy to 
see, see (3.2) that the operators Nj, := A, — AxIz, are nilpotent, Nee = 0. 


Proof of Theorem 3.4. Let mz be the multiplicity of the eigenvalue A,, 
so p(z) = [Ty (2 — Ax)" is the characteristic polynomial of A. Define 


pr(z) = p(2)/(2z— Any = [[ (2-9). 
afk 


Lemma 3.6. 
(3.3) (A—A,I)"™* |p, = 9, 


Proof. There are 2 possible simple proofs. The first one is to notice that 
mr = dz, where dy is the depth of the eigenvalue A, and use the fact that 


(A — Ax)" |e, = (Ak — AZo.) = 9, 
where A, := Alp, (property 2 of the generalized eigenspaces). 


The second possibility is to notice that according to the Spectral Map- 
ping Theorem, see Corollary 2.2, the operator P,(A)|m, = pe(Apx) is invert- 
ible. By the Cayley-Hamilton Theorem (Theorem 1.1) 


0 = p(A) = (A= Agl)"™* pz (A), 
and restriction all operators to E, we get 
O = p(Ag) = (Ak — AgIm,)"* PKA); 


so 
(Ag — An Tn )™* = p(Ax)pe(Ax) > = Ope(Ag)~* = 0. 


To prove the theorem define 


a(z) = >> pe(2). 
k=1 


Since pp(A;) = 0 for j A k and pxg(Ax) A 0, we can conclude that q(A,) 4 0 
for all k. Therefore, by the Spectral Mapping Theorem, see Corollary 2.2, 
the operator 

B= (A) 


is invertible. 
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Note that BE, C E, (any A-invariant subspace is also p(A)-invariant). 
Since B is an invertible operator, dim(BE;,) = dim E,, which together with 
BE, C E;, implies BE; = Ex. Multiplying the last identity by B~! we get 
that B-'E;, = Ex, i.e. that E, is an invariant subspace of Bo!. 

Note also, that it follows from (3.3) that 


pe(A)le, =O Wik, 

because p,(A)|z; = pp (Aj) and py (Aj) contains the factor (Aj —AjIn,)"™ = 
0. 

Define the operators Px by 

Pi, = Bo'p,(A). 

Lemma 3.7. For the operators P;, defined above 

a) Py + Pot... +P, =; 

b) Prlp; =0 for j #k; 

c) Ran Py C Ex; 

d) moreover, Pyv = v Vv € Ex, so, in fact Ran Py = Ex. 


Proof. Property 1 is trivial: 


yas Va AaB Bas 
k=1 k=1 


Property 2 follows from (3.3). Indeed, p,(A) contains the factor (A—.,;)’", 
restriction of which to E; is zero. Therefore p,(A)|z, = 0 and thus Px|z; = 
Bo'p,(A)lz, =0. 

To prove property 3, recall that according to Cayley-Hamilton Theorem 
p(A) = 0. Since p(z) = (z — Ax)'""* pe(z), we have for w = px(A)v 

(A = Agl)™*w = (A — Agl)"* py, (A)v = p(A)v = 0. 

That means, any vector w in Ranp;(A) is annihilated by some power of 
(A — Axl), which by definition means that Ran p;(A) C Ex. 


To prove the last property, let us notice that it follows from (3.3) that 
for v € Ex 


pr(A)v = 5_ p;(A)v = By, 
j=l 


which implies P,v = B-'Bv =v. 


Now we are ready to complete the proof of the theorem. Take v € V and 
define v;, = P.v. Then according to Statement c) of Lemma 3.7, vz € Ex, 
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and by Statement a), 


r 
v= S Vk 
k=1 


so v admits a representation as a linear combination. 


To show that this representation is unique, we can just note, that if v is 
represented as v = )>;_1 Vk, Vk € Ex, then it follows from the Statements 
b) and d) of Lemma 3.7 that 


Prev = Pa(vi tvo t+... + Vr) = PeVve = Ve. 


3.3. Geometric meaning of algebraic multiplicity. 


Proposition 3.8. Algebraic multiplicity of an eigenvalue equals the dimen- 
sion of the corresponding generalized eigenspace. 


Proof. According to Remark 3.5, if we joint bases in generalized eigenspaces 
E;, = E), to get a basis in the whole space, the matrix of A in any such 
basis has a block-diagonal form diag{Aj, A2,...,A,}, where A, := Alp,. 
Operators N, = Ay — AxLp, are nilpotent, so o(N,x) = {0}. Therefore, 
the spectrum of the operator A, (recall that A, = Nz — A,I) consists of 
one eigenvalue ; of (algebraic) multiplicity n, = dim E,. The multiplicity 
equals nz because an operator in a finite-dimensional space V has exactly 
dim V eigenvalues counting multiplicities, and A; has only one eigenvalue. 

Note that we are free to pick bases in Ex, so let us pick them in such a 
way that the corresponding blocks A; are upper triangular. Then 

Mg 7 
det(A — AZ) = [J det(Ax — ATz,) = [[ Qe -)"*. 
k=1 k=1 


But this means that the algebraic multiplicity of the eigenvalue A, is np = 
dim £),. 


3.4. An important application. The following corollary is very impor- 
tant for differential equations. 


Corollary 3.9. Any operator A in V can be represented as A= D+N, 
where D is diagonalizable (i.e. diagonal in some basis) and N is nilpotent 
(N™ = 0 for some m), and DN = ND. 


Proof. As we discussed above, see Remark 3.5, if we join the bases in Ex, 
to get a basis in V, then in this basis A has the block diagonal form A = 
diag{ Ai, A2,..., Ar}, where Ay := Alp,, Ex = Ey,. The operators Nz := 
Ay—Axln, are nilpotent, and the operator D = diag{AIp,, A2I zn, ...,ArLn, } 
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is diagonal (in this basis). Notice also that Aylg,Ne = NeAnlp, (iden- 
tity operator commutes with any operator), so the block diagonal operator 
N = diag{Nj, No,..., N,} commutes with D, DN = ND. Therefore, defin- 
ing N as the block diagonal operator N = diag{ Nj, No,..., N,} we get the 
desired decomposition. 


This corollary allows us to compute functions of operators. Let us recal 
that if p is a polynomial of degree d, then p(a + x) can be computed with 
the help of Taylor’s formula 


2 lk) (q 
pia+ 2x) ays e ) oh 
k=0 : 


This formula is an algebraic identity, meaning that for each polynomial p 
we can check that the formula is true using formal algebraic manipulations 
with a and x and not caring about their nature. 


Since operators D and N commute, DN = ND, the same rules as for 
usual (scalar) variables apply to them, and we can write (by plugging D 
instead of a and N instead of x 


Here, to compute the derivative p)(D) we first compute the kth derivative 
of the polynomial p(x) (using the usual rules from calculus), and then plug 
D instead of x. 


But since N is nilpotent, N™ = 0 for some m, only first m terms can 
be non-zero, so 


Be 


p(A) = p(D+ N) = 
k=0 


f(D) 
a N*, 


In m is much smaller than d, this formula makes computation of p(A) much 
easier. 

The same approach works if p is not a polynomial, but an infinite power 
series. For general power series we have to be careful about convergence 
of all the series involved, so we cannot say that the formula is true for an 
arbitrary power series p(x). However, if the radius of convergence of the 
power series is oo, then everything works fine. In particular, if p(a) = e7, 
then, using the fact that (e”)/ = e” we get. 


This formula has important applications in differential equation. 
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Note, that the fact that ND = DN is essential here! 


A, Structure of nilpotent operators 


Recall, that an operator A in a vector space V is called nilpotent if A* = 0 
for some exponent k. 


In the previous section we have proved, see Remark 3.5, that if we join 
the bases in all generalized eigenspaces Ex = E), to get a basis in the 
whole space, then the operator A has in this basis a block diagonal form 
diag{A,, Ao,...,A,} and operators A; ca be represented as Ay = Axl + Nz, 
where Nx are nilpotent operators. 

In each generalized eigenspace FE, we want to pick up a basis such that 
the matrix of Az in this basis has the simplest possible form. Since matrix 
(in any basis) of the identity operator is the identity matrix, we need to find 
a basis in which the nilpotent operator N, has a simple form. 


Since we can deal with each N; separately, we will need to consider the 
following problem: 
For a nilpotent operator A find a basis such that the matrix 
of A in this basis is simple. 


Let see, what does it mean for a matrix to have a simple form. It is easy to 
see that the matrix 


01 0 

0 1 

(4.1) 0 
1 
0 0 


is nilpotent. 


These matrices (together with 1 x 1 zero matrices) will be our “building 
blocks”. Namely, we will show that for any nilpotent operator one can find 
a basis such that the matrix of the operator in this basis has the block 
diagonal form diag{A,, Ag,...,A,}, where each Aj is either a block of form 
(4.1) or a 1 x 1 zero block. 


Let us see what we should be looking for. Suppose the matrix of an 


operator A has in a basis v1, V2,...,Vp the form (4.1). Then 
(4.2) Avi =0 
and 


(4.3) AVE41 = Vk; k=1,2,...,p—1. 
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Thus we have to be looking for the chains of vectors v1, V2,..., Vp satisfying 
the above relations (4.2), (4.3). 


4.1. Cycles of generalized eigenvectors. 


Definition. Let A be a nilpotent operator. A chain of non-zero vectors 
V1,V2,--.,Vp Satisfying relations (4.2), (4.3) is called a cycle of generalized 
eigenvectors of A. The vector vj is called the initial vector of the cycle, the 
vector Vp is called the end vector of the cycle, and the number p is called 
the length of the cycle. 


Remark. A similar definition can be made for an arbitrary operator. Then 
all vectors vz must belong to the same generalized eigenspace FE, and they 
must satisfy the identities 


(A—Al)vi = 0, (A — Al)ve41 = Ve, k=1,2,...,p—1, 


Theorem 4.1. Let A be a nilpotent operator, and let C,,C2,...,C, be cycles 
of its generalized eigenvectors, Ch = vi vk, ug oe pr being the length of 
the cycle Cy. Assume that the initial vectors vi,vi, ..., Vy are linearly in- 
dependent. Then no vector belongs to two cycles, and the union of all the 
vectors from all the cycles is a linearly independent. 


Proof. Let n = p,; + po+...+ p, be the total number of vectors in all the 
cycles”. We will use induction in n. If n = 1 the theorem is trivial. 


Let us now assume, that the theorem is true for all operators and for all 
collection of cycles, as long as the total number of vectors in all the cycles 
is strictly less than n. 

Without loss of generality we can assume that the vectors vk span the 
whole space V, because, otherwise we can consider instead of the operator 
A its restriction onto the invariant subspace span{vi Ph 1,2,20.57; 1 
j < e}- 

Consider the subspace Ran A. It follows from the relations (4.2), (4.3) 
that vectors vy :k =1,2,...,r, 1 <j < pp —1 span Ran A. Note that if 
Pr > 1 then the system vive, awk 


geal is a cycle, and that A annihilates 
any cycle of length 1. 


Therefore, we have finitely many cycles, and initial vectors of these cycles 
are linearly independent, so the induction hypothesis applies, and the vectors 
vi >k = 1,2,...,r, 1 <j < pp —1 are linearly independent. Since these 
vectors also span Ran A, we have a basis there. Therefore, 


rank A = dim Ran A =n —-—r 


2Here we just count vectors in each cycle, and add all the numbers. We do not care if some 
cycles have a common vector, we count this vector in each cycle it belongs to (of course, according 
to the theorem, it is impossible, but initially we cannot assume that) 
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(we had n vectors, and we removed one vector ie from each cycle Cx, 
k =1,2,...,7r, so we have n—r vectors in the basis vk the, Qo ecagty LS 
j <pr—1). On the other hand Avi = 0 fork = 1,2,...,r, and since these 
vectors are linearly independent dimKerA > r. By the Rank Theorem 
(Theorem 7.1 from Chapter 2) 


dim V = rank A + dim Ker A = (n—r)+dimKer A> (n-—r)+r=n 


so dimV > n. 


On the other hand V is spanned by n vectors, therefore the vectors v 
k=1,2,...,r, 1 <j < px, form a basis, so they are linearly independent 


-s 


4.2. Jordan canonical form of a nilpotent operator. 


Theorem 4.2. Let A: V > V be a nilpotent operator. Then V has a basis 
consisting of union of cycles of generalized eigenvectors of the operator A. 


Proof. We will use induction in n where n = dimV. For n = 1 the theorem 
is trivial. 

Assume that the theorem is true for any operator acting in a space of 
dimension strictly less than n. 


Consider the subspace X = Ran A. X is an invariant subspace of the 
operator A, so we can consider the restriction Alx. 


Since A is not invertible, dimRan A < dimV, so by the induction hy- 


pothesis there exist cycles C1, C2,...,C, of generalized eigenvectors such that 
their union is a basis in X. Let Cy = vive, ee 3 where vk is the initial 


vector of the cycle. 

Since the end vector vi belong to Ran A, one can find a vector 41 
such that Avp,41 = vk. So we can extend each cycle C;, to a bigger cycle 
Ch = vive, Lea ie ae Since the initial vectors vk of cycles Crs k= 
1,2,...,7r are linearly independent, the above Theorem 4.1 implies that the 


union of these cycles is a linearly independent system. 


By the definition of the cycle we have v? € KerA, and we assumed 


that the initial vectors vi, k = 1,2,...,r are linearly independent. Let us 
complete this system to a basis in Ker A, i.e. let find vectors uy, ug,..., Ug 
such that the system v},v?,...,v{, U1, W2,---, Ug is a basis in Ker A (it may 
happen that the system vi, k = 1,2,...,r is already a basis in Ker A, in 
which case we put g = 0 and add nothing). 

The vector uj; can be treated as a cycle of length 1, so we have a collec- 
tion of cycles Citas me rune uj, U2,...,Ug, whose initial vectors are linearly 
independent. So, we can apply Theorem 4.1 to get that the union of all 
these cycles is a linearly independent system. 
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To show that it is a basis, let us count the dimensions. We know that 
the cycles C1,C2,...,C, have dim Ran A = rank A vectors total. Each cycle 
Cz was obtained from Cy by adding 1 vector to it, so the total number of 
vectors in all the cycles Ch isrankA+r. 


We know that dim Ker A = r + q (because viovi, vee, V4}, U1, Ug,..., Ug 


is a basis there). We added to the cycles Gi, c. ...,C, additional q vectors, 
so we got 


rank A+r+q=rank A+ dim Ker A = dimV 


linearly independent vectors. But dim V linearly independent vectors is a 
basis. 


Definition. A basis consisting of a union of cycles of generalized eigen- 
vectors of a nilpotent operator A (existence of which is guaranteed by the 
Theorem 4.2) is called a Jordan canonical basis for A. 


Note, that such basis is not unique. 


Corollary 4.3. Let A be a nilpotent operator. There exists a basis (a Jordan 
canonical basis) such that the matrix of A in this basis is a block diagonal 
diag{A,, Ao,...,A,}, where all A; (except may be one) are blocks of form 
(4.1), and one of the blocks Ax can be zero. 


The matrix of A in a Jordan canonical basis is called the Jordan canoni- 
cal form of the operator A. We will see later that the Jordan canonical form 
is unique, if we agree on how to order the blocks (i.e. on how to order the 
vectors in the basis). 


Proof of Corollary 4.3. According to Theorem 4.2 one can find a basis 
consisting of a union of cycles of generalized eigenvectors. A cycle of size 
Pp gives rise to a p x p diagonal block of form (4.1), and a cycle of length 1 
correspond to a 1 x 1 zero block. We can join these 1 x 1 zero blocks in one 
large zero block (because off-diagonal entries are 0). 


4.3. Dot diagrams. Uniqueness of the Jordan canonical form. 
There is a good way of visualizing Theorem 4.2 and Corollary 4.3, the so- 
called dot diagrams. This methods also allows us to answer many natural 
questions, like “is the block diagonal representation given by Corollary 4.3 
unique?” 

Of course, if we treat this question literally, the answer is “no”, for we 
always can change the order of the blocks. But, if we exclude such trivial 
possibilities, for example by agreeing on some order of blocks (say, if we put 
all non-zero blocks in decreasing order, and then put the zero block), is the 
representation unique, or not? 
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Figure 1. Dot diagram and corresponding Jordan canonical form of a 
nilpotent operator 


To better understand the structure of nilpotent operators, described in 
the Section 4.1, let us draw the so-called dot diagram. Namely, suppose we 
have a basis, which is a union of cycles of generalized eigenvalues. Let us 
represent the basis by an array of dots, so that each column represents a 
cycle. The first row consists of initial vectors of cycles, and we arrange the 
columns (cycles) by their length, putting the longest one to the left. 


On the figure 1 we have the dot diagram of a nilpotent operator, as 
well as its Jordan canonical form. This dot diagram shows, that the basis 
has 1 cycle of length 5, one cycle of length 3, two cycles of length 2, and 2 
cycles of length 1. The cycle of length 5 corresponds to the 5 x 5 block of 
the matrix, the cycle of length 3 correspond to 3 non-zero block, and two 
cycles of length 2 correspond to two 2 x 2 blocks. Three cycles of length 1 
correspond to two zero entries on the diagonal. Here in each block we only 
giving the main diagonal and the diagonal above it; all other entries of the 
matrix are zero. 


If we agree on the ordering of the blocks, there is a one-to-one corre- 
spondence between dot diagrams and Jordan canonical forms (for nilpotent 
operators). So, the question about uniqueness of the Jordan canonical form 
is equivalent to the question about uniqueness of the dot diagram. 

To answer this question, let us analyze, how the operator A transforms 


the dot diagram. Since the operator A annihilates initial vectors of the 
cycles, and moves vector vz41 of a cycle to the vector vz, we can see that 
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the operator A acts on its dot diagram by deleting the first (top) row of the 
diagram. 

The new dot diagram corresponds to a Jordan canonical basis in Ran A, 
and allows us to write down the Jordan canonical form for the restriction 
Alpan A- 

Similarly, it is not hard to see that the operator A* removes the first 
k rows of the dot diagram. Therefore, if for all k we know the dimensions 
dim Ker(A*), we know the dot diagram of the operator A. Namely, the 
number of dots in the first row is dim Ker A, the number of dots in the 
second row is 


dim Ker(A”) — dim Ker A, 
and the number of dots in the kth row is 


dim Ker(A*) — dim Ker(A**"). 


But this means that the dot diagram, which was initially defined using 
a Jordan canonical basis, does not depend on a particular choice of such a 
basis. Therefore, the dot diagram, is unique! This implies that if we agree 
on the order of the blocks, then the Jordan canonical form is unique. 


4.4. Computing a Jordan canonical basis. Let us say few words about 
computing a Jordan canonical basis for a nilpotent operator. Let pi be the 
largest positive integer such that A?!~! 4 0 (so A?! = 0). Equivalently, we 
can say that p; is the smallest non-negative integer such that A?! = 0 (and 
so A?!~! 4 Q). One can see from the above analysis of dot diagrams, that p1 
is the length of the longest cycle (so this also can be used as the definition 
of pi). 

Computing operators A*®, k = 1,2,...,p,—1, and counting dim Ker(A*) 
we can construct the dot diagram of A. Namely, dim Ker A gives us the 
number of dots in the top row, dim Ker A? — dim Ker A the number of dots 
in the next row, etc. 

Now we want to put vectors instead of dots and find a basis which is a 
union of cycles. 

We start by finding the longest cycles (because we know the dot diagram, 
we know how many cycles should be there, and what is the length of each 
cycle). Consider a basis in the column space Ran(A?!). Name the vectors 


in this basis v},v?,... ,vj', these will be the initial vectors of the cycles. 
Then we find the end vectors of the cycles Lee Vee ..+,Wp! by solving the 


equations 


APi“lyk =a, KS 1, 2.3 rie 
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Applying consecutively the operator A to the end vector we we get all the 
vectors vk in the cycle. Thus, we have constructed all cycles of maximal 


length. 


Let p2 be the length of a maximal cycle among those that are left to 
find. Consider the subspace Ran(A?2~!), and let dim Ran(A?27!) = ro. 
Since Ran(A?!~1) Cc Ran(A?2~+), we can complete the basis vt, v7,...,vj 
in Ran(A?!~1) to a basis v},v?,..., vj, vj't!,..., vj? in Ran(A”2~!). Then 
we find end vectors of the cycles Cp,41,..-,Crz by solving (for vi.) the equa- 
tions 

Amlyk = vk, k=7r1+1,71+2,...,179, 
thus constructing the cycles of length po. 


Let p3 denote the length of a maximal cycle among ones left. Then, 
completing the basis vt, v7,..., vj? in Ker(A?2~!) to a basis in Ker(A?3~+) 
we construct the cycles of length p3, and so on... 

One final remark: as we discussed above, if we know the dot diagram, we 
know the canonical form, so after we have found a Jordan canonical basis, 
we do not need to compute the matrix of A in this basis: we already know 
it! 


5. Jordan decomposition theorem 


Theorem 5.1. Given an operator A there exist a basis (Jordan canonical 
basis) such that the matrix of A in this basis has a block diagonal form with 
blocks of form 


A 1 0 
r 
(5.1) d 
0) r 


where is an eigenvalue of A. Here we assume that the block of size 1 is 
just 2. 


The block diagonal form from Theorem 5.1 is called the Jordan canonical 
form of the operator A. The corresponding basis is called a Jordan canonical 
basis for an operator A. 


Proof of Theorem 5.1. According to Theorem 3.4 and Remark 3.5, if 
we join bases in the generalized eigenspaces E, = E), to get a basis in 
the whole space, the matrix of A in this basis has a block diagonal form 
diag{ Ai, Ao,...,A,}, where A, = Alp,. The operators Ny, = Ap — AxlE, 
are nilpotent, so by Theorem 4.2 (more precisely, by Corollary 4.3) one can 
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find a basis in E, such that the matrix of N;, in this basis is the Jordan 
canonical form of Nz. To get the matrix of A, in this basis one just puts Az 
instead of 0 on the main diagonal. 


5.1. Remarks about computing Jordan canonical basis. First of all 
let us recall that the computing of eigenvalues is the hardest part, but here 
we do not discuss this part, and assume that eigenvalues are already com- 
puted. 


For each eigenvalue \ we compute subspaces Ker(A — AI)", k = 1,2,... 
until the sequence of the subspaces stabilizes. In fact, since we have an 
increasing sequence of subspaces (Ker(A — AI)" C Ker(A — AI)*+1), then it 
is sufficient only to keep track of their dimension (or ranks of the operators 
(A — AD)*). For an eigenvalue \ let m = m, be the number where the 
sequence Ker(A — XJ) stabilizes, i.e. m satisfies 


dim Ker(A — \J)™! < dim Ker(A — AJ)™ = dim Ker(A — AJ)™*?. 


Then Ey) = Ker(A—AI)”™ is the generalized eigenspace corresponding to the 
eigenvalue 4. 


After we computed all the generalized eigenspaces there are two possible 
ways of action. The first way is to find a basis in each generalized eigenspace, 
so the matrix of the operator A in this basis has the block-diagonal form 
diag{ Aj, Ao,..., Ar}, where A; = Alz,,- Then we can deal with each ma- 
trix A, separately. The operators Nj, = Az — AxI are nilpotent, so applying 
the algorithm described in Section 4.4 we get the Jordan canonical repre- 
sentation for N;,, and putting A, instead of 0 on the main diagonal, we get 
the Jordan canonical representation for the block Ay. The advantage of this 
approach is that we are working with smaller blocks. But we need to find 
the matrix of the operator in a new basis, which involves inverting a matrix 
and matrix multiplication. 


Another way is to find a Jordan canonical basis in each of the generalized 
eigenspaces FE), by working directly with the operator A, without splitting 
it first into the blocks. Again, the algorithm we outlined above in Section 
4.4 works with a slight modification. Namely, when computing a Jordan 
canonical basis for a generalized eigenspace EF), , instead of considering sub- 
spaces Ran(A; — Ax), which we would need to consider when working with 
the block A; separately, we consider the subspaces (A — AnI)E),,- 
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